Thread: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
Hi,
PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
We have noticed huge difference interms of execution plan ( response time) , When we pass the direct values Vs inner query to IN clause.
High level details of the use case are as follows
- As part of the SQL there are 2 tables named Process_instance (master) and Process_activity ( child)
- Wanted to fetch TOP 50 rows from Process_activity table for the given values of the Process_instance.
- When we used Inner Join / Inner query ( query1) between parent table and child table , LIMIT is not really taking in to account. Instead it is fetching more rows and columns that required, and finally limiting the result
Query1
web_1=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id FROM process_activity pa WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01 00:00:00' AND pa.process_instance_id in (SELECT pi.process_instance_id FROM process_instance pi WHERE pi.user_id = '317079413683604' AND pi.app_id = '427380312000560') ORDER BY pa.process_instance_id,pa.created limit 50;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1071.47..1071.55 rows=31 width=24) (actual time=85.958..85.991 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43065
-> Sort (cost=1071.47..1071.55 rows=31 width=24) (actual time=85.956..85.971 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Sort Key: pa.process_instance_id, pa.created
Sort Method: top-N heapsort Memory: 28kB
Buffers: shared hit=43065
-> Nested Loop (cost=1.14..1070.70 rows=31 width=24) (actual time=0.031..72.183 rows=46992 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43065
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.010..0.013 rows=2 loops=1)
Output: pi.process_instance_id
Index Cond: (pi.user_id = '317079413683604'::bigint)
Filter: (pi.app_id = '427380312000560'::bigint)
Buffers: shared hit=5
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1053.80 rows=1425 width=24) (actual time=0.015..20.702 rows=23496 loops=2)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_response_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.market
ing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_translate, pa.in_reply_to
Index Cond: ((pa.process_instance_id = pi.process_instance_id) AND (pa.app_id = '427380312000560'::bigint) AND (pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=43060
Planning time: 0.499 ms
Execution time: 86.040 ms
(22 rows)
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1071.47..1071.55 rows=31 width=24) (actual time=85.958..85.991 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43065
-> Sort (cost=1071.47..1071.55 rows=31 width=24) (actual time=85.956..85.971 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Sort Key: pa.process_instance_id, pa.created
Sort Method: top-N heapsort Memory: 28kB
Buffers: shared hit=43065
-> Nested Loop (cost=1.14..1070.70 rows=31 width=24) (actual time=0.031..72.183 rows=46992 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43065
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.010..0.013 rows=2 loops=1)
Output: pi.process_instance_id
Index Cond: (pi.user_id = '317079413683604'::bigint)
Filter: (pi.app_id = '427380312000560'::bigint)
Buffers: shared hit=5
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1053.80 rows=1425 width=24) (actual time=0.015..20.702 rows=23496 loops=2)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_response_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.market
ing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_translate, pa.in_reply_to
Index Cond: ((pa.process_instance_id = pi.process_instance_id) AND (pa.app_id = '427380312000560'::bigint) AND (pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=43060
Planning time: 0.499 ms
Execution time: 86.040 ms
(22 rows)
Query 2
web_1=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS m_process_activity_id FROM process_activity m WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01 00:00:00' AND pa.process_instance_id in (240117466018927,325820556706970,433008275197305) ORDER BY pa.process_instance_id,pa.created limit 50;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.70..37.66 rows=50 width=24) (actual time=0.023..0.094 rows=50 loops=1)
Output: process_activity_id, process_instance_id, created
Buffers: shared hit=50
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..3124.97 rows=4226 width=24) (actual time=0.022..0.079 rows=50 loops=1)
Output: process_activity_id, process_instance_id, created
Index Cond: ((pa.process_instance_id = ANY ('{140117466018927,225820556706970,233008275197305}'::bigint[])) AND (pa.app_id = '427380312000560'::bigint) AND (pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=50
Planning time: 0.167 ms
Execution time: 0.137 ms
(9 rows)
web_1=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS m_process_activity_id FROM process_activity m WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01 00:00:00' AND pa.process_instance_id in (240117466018927,325820556706970,433008275197305) ORDER BY pa.process_instance_id,pa.created limit 50;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.70..37.66 rows=50 width=24) (actual time=0.023..0.094 rows=50 loops=1)
Output: process_activity_id, process_instance_id, created
Buffers: shared hit=50
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..3124.97 rows=4226 width=24) (actual time=0.022..0.079 rows=50 loops=1)
Output: process_activity_id, process_instance_id, created
Index Cond: ((pa.process_instance_id = ANY ('{140117466018927,225820556706970,233008275197305}'::bigint[])) AND (pa.app_id = '427380312000560'::bigint) AND (pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=50
Planning time: 0.167 ms
Execution time: 0.137 ms
(9 rows)
Can someone explain
- Why It is fetching more columns and more rows, incase of inner query ?
- Is there any option to really limit values with INNER JOIN, INNER query ? If yes, can you please share information on this ?
Thanks in advance for your time and suggestions.
Regards, Amar
Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
From
Adrian Klaver
Date:
On 5/7/20 4:19 AM, Amarendra Konda wrote: > Hi, > > PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled > by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit > > We have noticed huge difference interms of execution plan ( response > time) , When we pass the direct values Vs inner query to IN clause. > > High level details of the use case are as follows > > * As part of the SQL there are 2 tables named Process_instance > (master) and Process_activity ( child) > * Wanted to fetch TOP 50 rows from Process_activity table for the > given values of the Process_instance. > * When we used Inner Join / Inner query ( query1) between parent > table and child table , LIMIT is not really taking in to account. > Instead it is fetching more rows and columns that required, and > finally limiting the result It is doing what you told it to do which is SELECT all process_instance_i's for user_id='317079413683604' and app_id = '427380312000560' and then filtering further. I am going to guess that if you run the inner query alone you will find it returns ~23496 rows. You might have better results if you an actual join between process_activity and process_instance. Something like below(obviously not tested): SELECT pa.process_activity_id FROM process_activity pa JOIN process_instance pi ON pa.process_instance_id = pi.process_instance_id WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01 00:00:00' AND pi.user_id = '317079413683604' ORDER BY pa.process_instance_id, pa.created LIMIT 50; The second query is not equivalent as you are not filtering on user_id and you are filtering on only three process_instance_id's. > * > > > *Query1* > > web_1=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT > pa.process_activity_id FROM process_activity pa WHERE pa.app_id = > '427380312000560' AND pa.created > '1970-01-01 00:00:00' AND > pa.process_instance_id in *_(SELECT pi.process_instance_id FROM > process_instance pi WHERE pi.user_id = '317079413683604' AND pi.app_id = > '427380312000560')_* ORDER BY pa.process_instance_id,pa.created limit 50; > > > QUERY PLAN > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > Limit (cost=1071.47..1071.55 rows=31 width=24) (actual > time=85.958..85.991 rows=50 loops=1) > Output: pa.process_activity_id, pa.process_instance_id, pa.created > Buffers: shared hit=43065 > -> Sort (cost=1071.47..1071.55 rows=31 width=24) (actual > time=85.956..85.971 rows=50 loops=1) > Output: pa.process_activity_id, pa.process_instance_id, pa.created > Sort Key: pa.process_instance_id, pa.created > Sort Method: top-N heapsort Memory: 28kB > Buffers: shared hit=43065 > -> Nested Loop (cost=1.14..1070.70 rows=31 width=24) (actual > time=0.031..72.183 rows=46992 loops=1) > Output: pa.process_activity_id, pa.process_instance_id, > pa.created > Buffers: shared hit=43065 > -> Index Scan using fki_conv_konotor_user_user_id on > public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual > time=0.010..0.013 rows=2 loops=1) > Output: pi.process_instance_id > Index Cond: (pi.user_id = '317079413683604'::bigint) > Filter: (pi.app_id = '427380312000560'::bigint) > Buffers: shared hit=5 > -> Index Scan using > process_activity_process_instance_id_app_id_created_idx on > public.process_activity pa (cost=0.70..1053.80 rows=1425 width=24) > (actual time=0.015..20.702 rows=*23496* loops=2) > * Output: pa.process_activity_id, pa.process_activity_type, > pa.voice_url, pa.process_activity_user_id, pa.app_id, > pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, > pa.label_category_id, pa.label_id, pa.csat_response_id, > pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.market > ing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, > pa.internal_meta, pa.interaction_id, pa.do_not_translate, > pa.should_translate, pa.in_reply_to* > Index Cond: ((pa.process_instance_id = > pi.process_instance_id) AND (pa.app_id = '427380312000560'::bigint) AND > (pa.created > '1970-01-01 00:00:00'::timestamp without time zone)) > Buffers: shared hit=43060 > Planning time: 0.499 ms > Execution time: 86.040 ms > (22 rows) > > *_Query 2_* > > web_1=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT > pa.process_activity_id AS m_process_activity_id FROM process_activity m > WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01 > 00:00:00' AND pa.process_instance_id in > (*240117466018927,325820556706970,433008275197305*) ORDER BY > pa.process_instance_id,pa.created limit 50; > > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.70..37.66 rows=50 width=24) (actual time=0.023..0.094 > rows=50 loops=1) > Output: process_activity_id, process_instance_id, created > Buffers: shared hit=50 > -> Index Scan using > process_activity_process_instance_id_app_id_created_idx on > public.process_activity pa (cost=0.70..3124.97 rows=4226 width=24) > (actual time=0.022..0.079 *rows=50* loops=1) > Output: process_activity_id, process_instance_id, created > Index Cond: ((pa.process_instance_id = ANY > ('{140117466018927,225820556706970,233008275197305}'::bigint[])) AND > (pa.app_id = '427380312000560'::bigint) AND (pa.created > '1970-01-01 > 00:00:00'::timestamp without time zone)) > Buffers: shared hit=50 > Planning time: 0.167 ms > Execution time: 0.137 ms > (9 rows) > > > Can someone explain > > * Why It is fetching more columns and more rows, incase of inner query ? > * Is there any option to really limit values with INNER JOIN, INNER > query ? If yes, can you please share information on this ? > > Thanks in advance for your time and suggestions. > > Regards, Amar -- Adrian Klaver adrian.klaver@aklaver.com
Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
From
Adrian Klaver
Date:
On 5/7/20 4:19 AM, Amarendra Konda wrote: > Hi, > > PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled > by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit > > We have noticed huge difference interms of execution plan ( response > time) , When we pass the direct values Vs inner query to IN clause. > > High level details of the use case are as follows > > * As part of the SQL there are 2 tables named Process_instance > (master) and Process_activity ( child) > * Wanted to fetch TOP 50 rows from Process_activity table for the > given values of the Process_instance. > * When we used Inner Join / Inner query ( query1) between parent > table and child table , LIMIT is not really taking in to account. > Instead it is fetching more rows and columns that required, and > finally limiting the result It is doing what you told it to do which is SELECT all process_instance_i's for user_id='317079413683604' and app_id = '427380312000560' and then filtering further. I am going to guess that if you run the inner query alone you will find it returns ~23496 rows. You might have better results if you an actual join between process_activity and process_instance. Something like below(obviously not tested): SELECT pa.process_activity_id FROM process_activity pa JOIN process_instance pi ON pa.process_instance_id = pi.process_instance_id WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01 00:00:00' AND pi.user_id = '317079413683604' ORDER BY pa.process_instance_id, pa.created LIMIT 50; The second query is not equivalent as you are not filtering on user_id and you are filtering on only three process_instance_id's. > * > > > *Query1* > > web_1=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT > pa.process_activity_id FROM process_activity pa WHERE pa.app_id = > '427380312000560' AND pa.created > '1970-01-01 00:00:00' AND > pa.process_instance_id in *_(SELECT pi.process_instance_id FROM > process_instance pi WHERE pi.user_id = '317079413683604' AND pi.app_id = > '427380312000560')_* ORDER BY pa.process_instance_id,pa.created limit 50; > > > QUERY PLAN > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > Limit (cost=1071.47..1071.55 rows=31 width=24) (actual > time=85.958..85.991 rows=50 loops=1) > Output: pa.process_activity_id, pa.process_instance_id, pa.created > Buffers: shared hit=43065 > -> Sort (cost=1071.47..1071.55 rows=31 width=24) (actual > time=85.956..85.971 rows=50 loops=1) > Output: pa.process_activity_id, pa.process_instance_id, pa.created > Sort Key: pa.process_instance_id, pa.created > Sort Method: top-N heapsort Memory: 28kB > Buffers: shared hit=43065 > -> Nested Loop (cost=1.14..1070.70 rows=31 width=24) (actual > time=0.031..72.183 rows=46992 loops=1) > Output: pa.process_activity_id, pa.process_instance_id, > pa.created > Buffers: shared hit=43065 > -> Index Scan using fki_conv_konotor_user_user_id on > public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual > time=0.010..0.013 rows=2 loops=1) > Output: pi.process_instance_id > Index Cond: (pi.user_id = '317079413683604'::bigint) > Filter: (pi.app_id = '427380312000560'::bigint) > Buffers: shared hit=5 > -> Index Scan using > process_activity_process_instance_id_app_id_created_idx on > public.process_activity pa (cost=0.70..1053.80 rows=1425 width=24) > (actual time=0.015..20.702 rows=*23496* loops=2) > * Output: pa.process_activity_id, pa.process_activity_type, > pa.voice_url, pa.process_activity_user_id, pa.app_id, > pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, > pa.label_category_id, pa.label_id, pa.csat_response_id, > pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.market > ing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, > pa.internal_meta, pa.interaction_id, pa.do_not_translate, > pa.should_translate, pa.in_reply_to* > Index Cond: ((pa.process_instance_id = > pi.process_instance_id) AND (pa.app_id = '427380312000560'::bigint) AND > (pa.created > '1970-01-01 00:00:00'::timestamp without time zone)) > Buffers: shared hit=43060 > Planning time: 0.499 ms > Execution time: 86.040 ms > (22 rows) > > *_Query 2_* > > web_1=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT > pa.process_activity_id AS m_process_activity_id FROM process_activity m > WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01 > 00:00:00' AND pa.process_instance_id in > (*240117466018927,325820556706970,433008275197305*) ORDER BY > pa.process_instance_id,pa.created limit 50; > > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.70..37.66 rows=50 width=24) (actual time=0.023..0.094 > rows=50 loops=1) > Output: process_activity_id, process_instance_id, created > Buffers: shared hit=50 > -> Index Scan using > process_activity_process_instance_id_app_id_created_idx on > public.process_activity pa (cost=0.70..3124.97 rows=4226 width=24) > (actual time=0.022..0.079 *rows=50* loops=1) > Output: process_activity_id, process_instance_id, created > Index Cond: ((pa.process_instance_id = ANY > ('{140117466018927,225820556706970,233008275197305}'::bigint[])) AND > (pa.app_id = '427380312000560'::bigint) AND (pa.created > '1970-01-01 > 00:00:00'::timestamp without time zone)) > Buffers: shared hit=50 > Planning time: 0.167 ms > Execution time: 0.137 ms > (9 rows) > > > Can someone explain > > * Why It is fetching more columns and more rows, incase of inner query ? > * Is there any option to really limit values with INNER JOIN, INNER > query ? If yes, can you please share information on this ? > > Thanks in advance for your time and suggestions. > > Regards, Amar -- Adrian Klaver adrian.klaver@aklaver.com
Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
From
"David G. Johnston"
Date:
On Thu, May 7, 2020 at 7:40 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 5/7/20 4:19 AM, Amarendra Konda wrote:
> Hi,
>
> PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled
> by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
>
> We have noticed huge difference interms of execution plan ( response
> time) , When we pass the direct values Vs inner query to IN clause.
>
> High level details of the use case are as follows
>
> * As part of the SQL there are 2 tables named Process_instance
> (master) and Process_activity ( child)
> * Wanted to fetch TOP 50 rows from Process_activity table for the
> given values of the Process_instance.
> * When we used Inner Join / Inner query ( query1) between parent
> table and child table , LIMIT is not really taking in to account.
> Instead it is fetching more rows and columns that required, and
> finally limiting the result
It is doing what you told it to do which is SELECT all
process_instance_i's for user_id='317079413683604' and app_id =
'427380312000560' and then filtering further. I am going to guess that
if you run the inner query alone you will find it returns ~23496 rows.
You might have better results if you an actual join between
process_activity and process_instance. Something like below(obviously
not tested):
What the OP seems to want is a semi-join:
(not tested)
SELECT pa.process_activity_id
FROM process_activity pa WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01 00:00:00'
AND EXISTS (
SELECT 1 FROM process_instance pi WHERE pi.app_id = pa.app_id AND pi.user_id = '317079413683604'
)
ORDER BY
pa.process_instance_id,
pa.created limit 50;
I'm unsure exactly how this will impact the plan choice but it should be an improvement, and in any case more correctly defines what it is you are looking for.
David J.
Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
From
"David G. Johnston"
Date:
On Thu, May 7, 2020 at 7:40 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 5/7/20 4:19 AM, Amarendra Konda wrote:
> Hi,
>
> PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled
> by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
>
> We have noticed huge difference interms of execution plan ( response
> time) , When we pass the direct values Vs inner query to IN clause.
>
> High level details of the use case are as follows
>
> * As part of the SQL there are 2 tables named Process_instance
> (master) and Process_activity ( child)
> * Wanted to fetch TOP 50 rows from Process_activity table for the
> given values of the Process_instance.
> * When we used Inner Join / Inner query ( query1) between parent
> table and child table , LIMIT is not really taking in to account.
> Instead it is fetching more rows and columns that required, and
> finally limiting the result
It is doing what you told it to do which is SELECT all
process_instance_i's for user_id='317079413683604' and app_id =
'427380312000560' and then filtering further. I am going to guess that
if you run the inner query alone you will find it returns ~23496 rows.
You might have better results if you an actual join between
process_activity and process_instance. Something like below(obviously
not tested):
What the OP seems to want is a semi-join:
(not tested)
SELECT pa.process_activity_id
FROM process_activity pa WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01 00:00:00'
AND EXISTS (
SELECT 1 FROM process_instance pi WHERE pi.app_id = pa.app_id AND pi.user_id = '317079413683604'
)
ORDER BY
pa.process_instance_id,
pa.created limit 50;
I'm unsure exactly how this will impact the plan choice but it should be an improvement, and in any case more correctly defines what it is you are looking for.
David J.
Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
From
Amarendra Konda
Date:
Hi Adrian,
Thanks for the reply. And i have kept latest execution plans, for various SQL statements ( inner join, sub queries and placing values instead of sub query) .
As suggested, tried with INNER JOIN, however result was similar to subquery.
Is there any way we can tell the optimiser to process less number of rows based on the LIMIT value ? ( i.e. may be SQL re-write) ?
INNER SQL
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pi.process_instance_id AS pi_process_instance_id FROM process_instance pi WHERE pi.user_id = '137074931866340' AND pi.app_id = '126502930200650';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.018..0.019 rows=2 loops=1)
Output: process_instance_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=5
Planning time: 0.119 ms
Execution time: 0.041 ms
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.018..0.019 rows=2 loops=1)
Output: process_instance_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=5
Planning time: 0.119 ms
Execution time: 0.041 ms
Full query - Sub query
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND pa.process_instance_id in (SELECT pi.process_instance_id AS pi_process_instance_id FROM process_instance pi WHERE pi.user_id = '137074931866340' AND pi.app_id = '126502930200650') ORDER BY pa.process_instance_id, pa.created limit 50;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Limit (cost=1072.91..1072.99 rows=31 width=24) (actual time=744.386..744.415 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=3760 read=39316
-> Sort (cost=1072.91..1072.99 rows=31 width=24) (actual time=744.384..744.396 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Sort Key: pa.process_instance_id, pa.created
Sort Method: top-N heapsort Memory: 28kB
Buffers: shared hit=3760 read=39316
-> Nested Loop (cost=1.14..1072.14 rows=31 width=24) (actual time=0.044..727.297 rows=47011 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=3754 read=39316
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.009..0.015 rows=2 loops=1)
Output: pi.process_instance_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=5
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1055.22 rows=1427 width=24) (actual time=0.029..349.000 rows=23506 loops=2)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_respons
e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_tr
anslate, pa.in_reply_to
Index Cond: ((pa.process_instance_id = pi.process_instance_id) AND (pa.app_id = '126502930200650'::bigint) AND (pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=3749 read=39316
Planning time: 2.547 ms
Execution time: 744.499 ms
(22 rows)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Limit (cost=1072.91..1072.99 rows=31 width=24) (actual time=744.386..744.415 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=3760 read=39316
-> Sort (cost=1072.91..1072.99 rows=31 width=24) (actual time=744.384..744.396 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Sort Key: pa.process_instance_id, pa.created
Sort Method: top-N heapsort Memory: 28kB
Buffers: shared hit=3760 read=39316
-> Nested Loop (cost=1.14..1072.14 rows=31 width=24) (actual time=0.044..727.297 rows=47011 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=3754 read=39316
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.009..0.015 rows=2 loops=1)
Output: pi.process_instance_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=5
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1055.22 rows=1427 width=24) (actual time=0.029..349.000 rows=23506 loops=2)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_respons
e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_tr
anslate, pa.in_reply_to
Index Cond: ((pa.process_instance_id = pi.process_instance_id) AND (pa.app_id = '126502930200650'::bigint) AND (pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=3749 read=39316
Planning time: 2.547 ms
Execution time: 744.499 ms
(22 rows)
Full query - INNER JOIN
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activity pa INNER JOIN process_instance pi ON pi.process_instance_id = pa.process_instance_id AND pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND pi.user_id = '137074931866340' AND pi.app_id = '126502930200650' ORDER BY pa.process_instance_id, pa.created limit 50;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Limit (cost=1072.91..1072.99 rows=31 width=24) (actual time=87.803..87.834 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Sort (cost=1072.91..1072.99 rows=31 width=24) (actual time=87.803..87.815 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Sort Key: pa.process_instance_id, pa.created
Sort Method: top-N heapsort Memory: 28kB
Buffers: shared hit=43070
-> Nested Loop (cost=1.14..1072.14 rows=31 width=24) (actual time=0.030..73.847 rows=47011 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.015..0.018 rows=2 loops=1)
Output: pi.process_instance_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=5
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1055.22 rows=1427 width=24) (actual time=0.011..21.447 rows=23506 loops=2)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_respons
e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_tr
anslate, pa.in_reply_to
Index Cond: ((pa.process_instance_id = pi.process_instance_id) AND (pa.app_id = '126502930200650'::bigint) AND (pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=43065
Planning time: 0.428 ms
Execution time: 87.905 ms
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Limit (cost=1072.91..1072.99 rows=31 width=24) (actual time=87.803..87.834 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Sort (cost=1072.91..1072.99 rows=31 width=24) (actual time=87.803..87.815 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Sort Key: pa.process_instance_id, pa.created
Sort Method: top-N heapsort Memory: 28kB
Buffers: shared hit=43070
-> Nested Loop (cost=1.14..1072.14 rows=31 width=24) (actual time=0.030..73.847 rows=47011 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.015..0.018 rows=2 loops=1)
Output: pi.process_instance_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=5
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1055.22 rows=1427 width=24) (actual time=0.011..21.447 rows=23506 loops=2)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_respons
e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_tr
anslate, pa.in_reply_to
Index Cond: ((pa.process_instance_id = pi.process_instance_id) AND (pa.app_id = '126502930200650'::bigint) AND (pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=43065
Planning time: 0.428 ms
Execution time: 87.905 ms
FULL Query - INNER SQL replaced with result
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS m_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND pa.process_instance_id in (137074941043913,164357609323111) ORDER BY pa.process_instance_id,pa.created limit 50;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------
Limit (cost=0.70..37.65 rows=50 width=24) (actual time=0.016..0.095 rows=50 loops=1)
Output: process_activity_id, process_instance_id, created
Buffers: shared hit=55
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..2100.39 rows=2841 width=24) (actual time=0.015..0.077 rows=50 loops=1)
Output: process_activity_id, process_instance_id, created
Index Cond: ((pa.process_instance_id = ANY ('{137074941043913,164357609323111}'::bigint[])) AND (pa.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time
zone))
Buffers: shared hit=55
Planning time: 1.710 ms
Execution time: 0.147 ms
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------
Limit (cost=0.70..37.65 rows=50 width=24) (actual time=0.016..0.095 rows=50 loops=1)
Output: process_activity_id, process_instance_id, created
Buffers: shared hit=55
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..2100.39 rows=2841 width=24) (actual time=0.015..0.077 rows=50 loops=1)
Output: process_activity_id, process_instance_id, created
Index Cond: ((pa.process_instance_id = ANY ('{137074941043913,164357609323111}'::bigint[])) AND (pa.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time
zone))
Buffers: shared hit=55
Planning time: 1.710 ms
Execution time: 0.147 ms
Regards, Amar
On Thu, May 7, 2020 at 8:10 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 5/7/20 4:19 AM, Amarendra Konda wrote:
> Hi,
>
> PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled
> by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
>
> We have noticed huge difference interms of execution plan ( response
> time) , When we pass the direct values Vs inner query to IN clause.
>
> High level details of the use case are as follows
>
> * As part of the SQL there are 2 tables named Process_instance
> (master) and Process_activity ( child)
> * Wanted to fetch TOP 50 rows from Process_activity table for the
> given values of the Process_instance.
> * When we used Inner Join / Inner query ( query1) between parent
> table and child table , LIMIT is not really taking in to account.
> Instead it is fetching more rows and columns that required, and
> finally limiting the result
It is doing what you told it to do which is SELECT all
process_instance_i's for user_id='317079413683604' and app_id =
'427380312000560' and then filtering further. I am going to guess that
if you run the inner query alone you will find it returns ~23496 rows.
You might have better results if you an actual join between
process_activity and process_instance. Something like below(obviously
not tested):
SELECT
pa.process_activity_id
FROM
process_activity pa
JOIN
process_instance pi
ON
pa.process_instance_id = pi.process_instance_id
WHERE
pa.app_id = '427380312000560'
AND
pa.created > '1970-01-01 00:00:00'
AND
pi.user_id = '317079413683604'
ORDER BY
pa.process_instance_id,
pa.created
LIMIT 50;
The second query is not equivalent as you are not filtering on user_id
and you are filtering on only three process_instance_id's.
> *
>
>
> *Query1*
>
> web_1=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT
> pa.process_activity_id FROM process_activity pa WHERE pa.app_id =
> '427380312000560' AND pa.created > '1970-01-01 00:00:00' AND
> pa.process_instance_id in *_(SELECT pi.process_instance_id FROM
> process_instance pi WHERE pi.user_id = '317079413683604' AND pi.app_id =
> '427380312000560')_* ORDER BY pa.process_instance_id,pa.created limit 50;
>
>
> QUERY PLAN
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=1071.47..1071.55 rows=31 width=24) (actual
> time=85.958..85.991 rows=50 loops=1)
> Output: pa.process_activity_id, pa.process_instance_id, pa.created
> Buffers: shared hit=43065
> -> Sort (cost=1071.47..1071.55 rows=31 width=24) (actual
> time=85.956..85.971 rows=50 loops=1)
> Output: pa.process_activity_id, pa.process_instance_id, pa.created
> Sort Key: pa.process_instance_id, pa.created
> Sort Method: top-N heapsort Memory: 28kB
> Buffers: shared hit=43065
> -> Nested Loop (cost=1.14..1070.70 rows=31 width=24) (actual
> time=0.031..72.183 rows=46992 loops=1)
> Output: pa.process_activity_id, pa.process_instance_id,
> pa.created
> Buffers: shared hit=43065
> -> Index Scan using fki_conv_konotor_user_user_id on
> public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual
> time=0.010..0.013 rows=2 loops=1)
> Output: pi.process_instance_id
> Index Cond: (pi.user_id = '317079413683604'::bigint)
> Filter: (pi.app_id = '427380312000560'::bigint)
> Buffers: shared hit=5
> -> Index Scan using
> process_activity_process_instance_id_app_id_created_idx on
> public.process_activity pa (cost=0.70..1053.80 rows=1425 width=24)
> (actual time=0.015..20.702 rows=*23496* loops=2)
> * Output: pa.process_activity_id, pa.process_activity_type,
> pa.voice_url, pa.process_activity_user_id, pa.app_id,
> pa.process_instance_id, pa.alias, pa.read_by_user, pa.source,
> pa.label_category_id, pa.label_id, pa.csat_response_id,
> pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.market
> ing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment,
> pa.internal_meta, pa.interaction_id, pa.do_not_translate,
> pa.should_translate, pa.in_reply_to*
> Index Cond: ((pa.process_instance_id =
> pi.process_instance_id) AND (pa.app_id = '427380312000560'::bigint) AND
> (pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
> Buffers: shared hit=43060
> Planning time: 0.499 ms
> Execution time: 86.040 ms
> (22 rows)
>
> *_Query 2_*
>
> web_1=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT
> pa.process_activity_id AS m_process_activity_id FROM process_activity m
> WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01
> 00:00:00' AND pa.process_instance_id in
> (*240117466018927,325820556706970,433008275197305*) ORDER BY
> pa.process_instance_id,pa.created limit 50;
>
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=0.70..37.66 rows=50 width=24) (actual time=0.023..0.094
> rows=50 loops=1)
> Output: process_activity_id, process_instance_id, created
> Buffers: shared hit=50
> -> Index Scan using
> process_activity_process_instance_id_app_id_created_idx on
> public.process_activity pa (cost=0.70..3124.97 rows=4226 width=24)
> (actual time=0.022..0.079 *rows=50* loops=1)
> Output: process_activity_id, process_instance_id, created
> Index Cond: ((pa.process_instance_id = ANY
> ('{140117466018927,225820556706970,233008275197305}'::bigint[])) AND
> (pa.app_id = '427380312000560'::bigint) AND (pa.created > '1970-01-01
> 00:00:00'::timestamp without time zone))
> Buffers: shared hit=50
> Planning time: 0.167 ms
> Execution time: 0.137 ms
> (9 rows)
>
>
> Can someone explain
>
> * Why It is fetching more columns and more rows, incase of inner query ?
> * Is there any option to really limit values with INNER JOIN, INNER
> query ? If yes, can you please share information on this ?
>
> Thanks in advance for your time and suggestions.
>
> Regards, Amar
--
Adrian Klaver
adrian.klaver@aklaver.com
Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
From
Amarendra Konda
Date:
Hi Adrian,
Thanks for the reply. And i have kept latest execution plans, for various SQL statements ( inner join, sub queries and placing values instead of sub query) .
As suggested, tried with INNER JOIN, however result was similar to subquery.
Is there any way we can tell the optimiser to process less number of rows based on the LIMIT value ? ( i.e. may be SQL re-write) ?
INNER SQL
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pi.process_instance_id AS pi_process_instance_id FROM process_instance pi WHERE pi.user_id = '137074931866340' AND pi.app_id = '126502930200650';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.018..0.019 rows=2 loops=1)
Output: process_instance_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=5
Planning time: 0.119 ms
Execution time: 0.041 ms
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.018..0.019 rows=2 loops=1)
Output: process_instance_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=5
Planning time: 0.119 ms
Execution time: 0.041 ms
Full query - Sub query
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND pa.process_instance_id in (SELECT pi.process_instance_id AS pi_process_instance_id FROM process_instance pi WHERE pi.user_id = '137074931866340' AND pi.app_id = '126502930200650') ORDER BY pa.process_instance_id, pa.created limit 50;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Limit (cost=1072.91..1072.99 rows=31 width=24) (actual time=744.386..744.415 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=3760 read=39316
-> Sort (cost=1072.91..1072.99 rows=31 width=24) (actual time=744.384..744.396 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Sort Key: pa.process_instance_id, pa.created
Sort Method: top-N heapsort Memory: 28kB
Buffers: shared hit=3760 read=39316
-> Nested Loop (cost=1.14..1072.14 rows=31 width=24) (actual time=0.044..727.297 rows=47011 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=3754 read=39316
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.009..0.015 rows=2 loops=1)
Output: pi.process_instance_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=5
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1055.22 rows=1427 width=24) (actual time=0.029..349.000 rows=23506 loops=2)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_respons
e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_tr
anslate, pa.in_reply_to
Index Cond: ((pa.process_instance_id = pi.process_instance_id) AND (pa.app_id = '126502930200650'::bigint) AND (pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=3749 read=39316
Planning time: 2.547 ms
Execution time: 744.499 ms
(22 rows)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Limit (cost=1072.91..1072.99 rows=31 width=24) (actual time=744.386..744.415 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=3760 read=39316
-> Sort (cost=1072.91..1072.99 rows=31 width=24) (actual time=744.384..744.396 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Sort Key: pa.process_instance_id, pa.created
Sort Method: top-N heapsort Memory: 28kB
Buffers: shared hit=3760 read=39316
-> Nested Loop (cost=1.14..1072.14 rows=31 width=24) (actual time=0.044..727.297 rows=47011 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=3754 read=39316
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.009..0.015 rows=2 loops=1)
Output: pi.process_instance_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=5
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1055.22 rows=1427 width=24) (actual time=0.029..349.000 rows=23506 loops=2)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_respons
e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_tr
anslate, pa.in_reply_to
Index Cond: ((pa.process_instance_id = pi.process_instance_id) AND (pa.app_id = '126502930200650'::bigint) AND (pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=3749 read=39316
Planning time: 2.547 ms
Execution time: 744.499 ms
(22 rows)
Full query - INNER JOIN
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activity pa INNER JOIN process_instance pi ON pi.process_instance_id = pa.process_instance_id AND pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND pi.user_id = '137074931866340' AND pi.app_id = '126502930200650' ORDER BY pa.process_instance_id, pa.created limit 50;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Limit (cost=1072.91..1072.99 rows=31 width=24) (actual time=87.803..87.834 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Sort (cost=1072.91..1072.99 rows=31 width=24) (actual time=87.803..87.815 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Sort Key: pa.process_instance_id, pa.created
Sort Method: top-N heapsort Memory: 28kB
Buffers: shared hit=43070
-> Nested Loop (cost=1.14..1072.14 rows=31 width=24) (actual time=0.030..73.847 rows=47011 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.015..0.018 rows=2 loops=1)
Output: pi.process_instance_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=5
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1055.22 rows=1427 width=24) (actual time=0.011..21.447 rows=23506 loops=2)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_respons
e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_tr
anslate, pa.in_reply_to
Index Cond: ((pa.process_instance_id = pi.process_instance_id) AND (pa.app_id = '126502930200650'::bigint) AND (pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=43065
Planning time: 0.428 ms
Execution time: 87.905 ms
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Limit (cost=1072.91..1072.99 rows=31 width=24) (actual time=87.803..87.834 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Sort (cost=1072.91..1072.99 rows=31 width=24) (actual time=87.803..87.815 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Sort Key: pa.process_instance_id, pa.created
Sort Method: top-N heapsort Memory: 28kB
Buffers: shared hit=43070
-> Nested Loop (cost=1.14..1072.14 rows=31 width=24) (actual time=0.030..73.847 rows=47011 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.015..0.018 rows=2 loops=1)
Output: pi.process_instance_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=5
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1055.22 rows=1427 width=24) (actual time=0.011..21.447 rows=23506 loops=2)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_respons
e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_tr
anslate, pa.in_reply_to
Index Cond: ((pa.process_instance_id = pi.process_instance_id) AND (pa.app_id = '126502930200650'::bigint) AND (pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=43065
Planning time: 0.428 ms
Execution time: 87.905 ms
FULL Query - INNER SQL replaced with result
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS m_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND pa.process_instance_id in (137074941043913,164357609323111) ORDER BY pa.process_instance_id,pa.created limit 50;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------
Limit (cost=0.70..37.65 rows=50 width=24) (actual time=0.016..0.095 rows=50 loops=1)
Output: process_activity_id, process_instance_id, created
Buffers: shared hit=55
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..2100.39 rows=2841 width=24) (actual time=0.015..0.077 rows=50 loops=1)
Output: process_activity_id, process_instance_id, created
Index Cond: ((pa.process_instance_id = ANY ('{137074941043913,164357609323111}'::bigint[])) AND (pa.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time
zone))
Buffers: shared hit=55
Planning time: 1.710 ms
Execution time: 0.147 ms
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------
Limit (cost=0.70..37.65 rows=50 width=24) (actual time=0.016..0.095 rows=50 loops=1)
Output: process_activity_id, process_instance_id, created
Buffers: shared hit=55
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..2100.39 rows=2841 width=24) (actual time=0.015..0.077 rows=50 loops=1)
Output: process_activity_id, process_instance_id, created
Index Cond: ((pa.process_instance_id = ANY ('{137074941043913,164357609323111}'::bigint[])) AND (pa.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time
zone))
Buffers: shared hit=55
Planning time: 1.710 ms
Execution time: 0.147 ms
Regards, Amar
On Thu, May 7, 2020 at 8:10 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 5/7/20 4:19 AM, Amarendra Konda wrote:
> Hi,
>
> PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled
> by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
>
> We have noticed huge difference interms of execution plan ( response
> time) , When we pass the direct values Vs inner query to IN clause.
>
> High level details of the use case are as follows
>
> * As part of the SQL there are 2 tables named Process_instance
> (master) and Process_activity ( child)
> * Wanted to fetch TOP 50 rows from Process_activity table for the
> given values of the Process_instance.
> * When we used Inner Join / Inner query ( query1) between parent
> table and child table , LIMIT is not really taking in to account.
> Instead it is fetching more rows and columns that required, and
> finally limiting the result
It is doing what you told it to do which is SELECT all
process_instance_i's for user_id='317079413683604' and app_id =
'427380312000560' and then filtering further. I am going to guess that
if you run the inner query alone you will find it returns ~23496 rows.
You might have better results if you an actual join between
process_activity and process_instance. Something like below(obviously
not tested):
SELECT
pa.process_activity_id
FROM
process_activity pa
JOIN
process_instance pi
ON
pa.process_instance_id = pi.process_instance_id
WHERE
pa.app_id = '427380312000560'
AND
pa.created > '1970-01-01 00:00:00'
AND
pi.user_id = '317079413683604'
ORDER BY
pa.process_instance_id,
pa.created
LIMIT 50;
The second query is not equivalent as you are not filtering on user_id
and you are filtering on only three process_instance_id's.
> *
>
>
> *Query1*
>
> web_1=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT
> pa.process_activity_id FROM process_activity pa WHERE pa.app_id =
> '427380312000560' AND pa.created > '1970-01-01 00:00:00' AND
> pa.process_instance_id in *_(SELECT pi.process_instance_id FROM
> process_instance pi WHERE pi.user_id = '317079413683604' AND pi.app_id =
> '427380312000560')_* ORDER BY pa.process_instance_id,pa.created limit 50;
>
>
> QUERY PLAN
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=1071.47..1071.55 rows=31 width=24) (actual
> time=85.958..85.991 rows=50 loops=1)
> Output: pa.process_activity_id, pa.process_instance_id, pa.created
> Buffers: shared hit=43065
> -> Sort (cost=1071.47..1071.55 rows=31 width=24) (actual
> time=85.956..85.971 rows=50 loops=1)
> Output: pa.process_activity_id, pa.process_instance_id, pa.created
> Sort Key: pa.process_instance_id, pa.created
> Sort Method: top-N heapsort Memory: 28kB
> Buffers: shared hit=43065
> -> Nested Loop (cost=1.14..1070.70 rows=31 width=24) (actual
> time=0.031..72.183 rows=46992 loops=1)
> Output: pa.process_activity_id, pa.process_instance_id,
> pa.created
> Buffers: shared hit=43065
> -> Index Scan using fki_conv_konotor_user_user_id on
> public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual
> time=0.010..0.013 rows=2 loops=1)
> Output: pi.process_instance_id
> Index Cond: (pi.user_id = '317079413683604'::bigint)
> Filter: (pi.app_id = '427380312000560'::bigint)
> Buffers: shared hit=5
> -> Index Scan using
> process_activity_process_instance_id_app_id_created_idx on
> public.process_activity pa (cost=0.70..1053.80 rows=1425 width=24)
> (actual time=0.015..20.702 rows=*23496* loops=2)
> * Output: pa.process_activity_id, pa.process_activity_type,
> pa.voice_url, pa.process_activity_user_id, pa.app_id,
> pa.process_instance_id, pa.alias, pa.read_by_user, pa.source,
> pa.label_category_id, pa.label_id, pa.csat_response_id,
> pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.market
> ing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment,
> pa.internal_meta, pa.interaction_id, pa.do_not_translate,
> pa.should_translate, pa.in_reply_to*
> Index Cond: ((pa.process_instance_id =
> pi.process_instance_id) AND (pa.app_id = '427380312000560'::bigint) AND
> (pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
> Buffers: shared hit=43060
> Planning time: 0.499 ms
> Execution time: 86.040 ms
> (22 rows)
>
> *_Query 2_*
>
> web_1=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT
> pa.process_activity_id AS m_process_activity_id FROM process_activity m
> WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01
> 00:00:00' AND pa.process_instance_id in
> (*240117466018927,325820556706970,433008275197305*) ORDER BY
> pa.process_instance_id,pa.created limit 50;
>
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=0.70..37.66 rows=50 width=24) (actual time=0.023..0.094
> rows=50 loops=1)
> Output: process_activity_id, process_instance_id, created
> Buffers: shared hit=50
> -> Index Scan using
> process_activity_process_instance_id_app_id_created_idx on
> public.process_activity pa (cost=0.70..3124.97 rows=4226 width=24)
> (actual time=0.022..0.079 *rows=50* loops=1)
> Output: process_activity_id, process_instance_id, created
> Index Cond: ((pa.process_instance_id = ANY
> ('{140117466018927,225820556706970,233008275197305}'::bigint[])) AND
> (pa.app_id = '427380312000560'::bigint) AND (pa.created > '1970-01-01
> 00:00:00'::timestamp without time zone))
> Buffers: shared hit=50
> Planning time: 0.167 ms
> Execution time: 0.137 ms
> (9 rows)
>
>
> Can someone explain
>
> * Why It is fetching more columns and more rows, incase of inner query ?
> * Is there any option to really limit values with INNER JOIN, INNER
> query ? If yes, can you please share information on this ?
>
> Thanks in advance for your time and suggestions.
>
> Regards, Amar
--
Adrian Klaver
adrian.klaver@aklaver.com
Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
From
Amarendra Konda
Date:
Hi David,
Thanks for the reply.This has optimized number of rows.
Can you please explain, why it is getting more columns in output, even though we have asked for only one column ?
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS ( SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND pi.user_id = '137074931866340') ORDER BY pa.process_instance_id,m.created limit 50;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------
Limit (cost=1.14..37.39 rows=50 width=24) (actual time=821.283..891.629 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=274950
-> Nested Loop Semi Join (cost=1.14..266660108.78 rows=367790473 width=24) (actual time=821.282..891.607 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=274950
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..262062725.21 rows=367790473 width=32) (actual time=821.253..891.517 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_response_id,
m.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_translat
e, pa.in_reply_to
Index Cond: ((m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=274946
-> Materialize (cost=0.43..2.66 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=50)
Output: pi.app_id
Buffers: shared hit=4
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.020..0.020 rows=1 loops=1)
Output: pi.app_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=4
Planning time: 0.297 ms
Execution time: 891.686 ms
(20 rows)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------
Limit (cost=1.14..37.39 rows=50 width=24) (actual time=821.283..891.629 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=274950
-> Nested Loop Semi Join (cost=1.14..266660108.78 rows=367790473 width=24) (actual time=821.282..891.607 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=274950
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..262062725.21 rows=367790473 width=32) (actual time=821.253..891.517 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_response_id,
m.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_translat
e, pa.in_reply_to
Index Cond: ((m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=274946
-> Materialize (cost=0.43..2.66 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=50)
Output: pi.app_id
Buffers: shared hit=4
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.020..0.020 rows=1 loops=1)
Output: pi.app_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=4
Planning time: 0.297 ms
Execution time: 891.686 ms
(20 rows)
On Thu, May 7, 2020 at 9:17 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Thu, May 7, 2020 at 7:40 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:On 5/7/20 4:19 AM, Amarendra Konda wrote:
> Hi,
>
> PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled
> by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
>
> We have noticed huge difference interms of execution plan ( response
> time) , When we pass the direct values Vs inner query to IN clause.
>
> High level details of the use case are as follows
>
> * As part of the SQL there are 2 tables named Process_instance
> (master) and Process_activity ( child)
> * Wanted to fetch TOP 50 rows from Process_activity table for the
> given values of the Process_instance.
> * When we used Inner Join / Inner query ( query1) between parent
> table and child table , LIMIT is not really taking in to account.
> Instead it is fetching more rows and columns that required, and
> finally limiting the result
It is doing what you told it to do which is SELECT all
process_instance_i's for user_id='317079413683604' and app_id =
'427380312000560' and then filtering further. I am going to guess that
if you run the inner query alone you will find it returns ~23496 rows.
You might have better results if you an actual join between
process_activity and process_instance. Something like below(obviously
not tested):What the OP seems to want is a semi-join:(not tested)SELECT pa.process_activity_idFROM process_activity pa WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01 00:00:00'AND EXISTS (SELECT 1 FROM process_instance pi WHERE pi.app_id = pa.app_id AND pi.user_id = '317079413683604')ORDER BYpa.process_instance_id,pa.created limit 50;I'm unsure exactly how this will impact the plan choice but it should be an improvement, and in any case more correctly defines what it is you are looking for.David J.
Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
From
Amarendra Konda
Date:
Hi David,
Thanks for the reply.This has optimized number of rows.
Can you please explain, why it is getting more columns in output, even though we have asked for only one column ?
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS ( SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND pi.user_id = '137074931866340') ORDER BY pa.process_instance_id,m.created limit 50;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------
Limit (cost=1.14..37.39 rows=50 width=24) (actual time=821.283..891.629 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=274950
-> Nested Loop Semi Join (cost=1.14..266660108.78 rows=367790473 width=24) (actual time=821.282..891.607 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=274950
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..262062725.21 rows=367790473 width=32) (actual time=821.253..891.517 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_response_id,
m.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_translat
e, pa.in_reply_to
Index Cond: ((m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=274946
-> Materialize (cost=0.43..2.66 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=50)
Output: pi.app_id
Buffers: shared hit=4
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.020..0.020 rows=1 loops=1)
Output: pi.app_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=4
Planning time: 0.297 ms
Execution time: 891.686 ms
(20 rows)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------
Limit (cost=1.14..37.39 rows=50 width=24) (actual time=821.283..891.629 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=274950
-> Nested Loop Semi Join (cost=1.14..266660108.78 rows=367790473 width=24) (actual time=821.282..891.607 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=274950
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..262062725.21 rows=367790473 width=32) (actual time=821.253..891.517 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_response_id,
m.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_translat
e, pa.in_reply_to
Index Cond: ((m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=274946
-> Materialize (cost=0.43..2.66 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=50)
Output: pi.app_id
Buffers: shared hit=4
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.020..0.020 rows=1 loops=1)
Output: pi.app_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=4
Planning time: 0.297 ms
Execution time: 891.686 ms
(20 rows)
On Thu, May 7, 2020 at 9:17 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Thu, May 7, 2020 at 7:40 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:On 5/7/20 4:19 AM, Amarendra Konda wrote:
> Hi,
>
> PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled
> by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
>
> We have noticed huge difference interms of execution plan ( response
> time) , When we pass the direct values Vs inner query to IN clause.
>
> High level details of the use case are as follows
>
> * As part of the SQL there are 2 tables named Process_instance
> (master) and Process_activity ( child)
> * Wanted to fetch TOP 50 rows from Process_activity table for the
> given values of the Process_instance.
> * When we used Inner Join / Inner query ( query1) between parent
> table and child table , LIMIT is not really taking in to account.
> Instead it is fetching more rows and columns that required, and
> finally limiting the result
It is doing what you told it to do which is SELECT all
process_instance_i's for user_id='317079413683604' and app_id =
'427380312000560' and then filtering further. I am going to guess that
if you run the inner query alone you will find it returns ~23496 rows.
You might have better results if you an actual join between
process_activity and process_instance. Something like below(obviously
not tested):What the OP seems to want is a semi-join:(not tested)SELECT pa.process_activity_idFROM process_activity pa WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01 00:00:00'AND EXISTS (SELECT 1 FROM process_instance pi WHERE pi.app_id = pa.app_id AND pi.user_id = '317079413683604')ORDER BYpa.process_instance_id,pa.created limit 50;I'm unsure exactly how this will impact the plan choice but it should be an improvement, and in any case more correctly defines what it is you are looking for.David J.
Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
From
Amarendra Konda
Date:
Hi David,
In earlier reply, Over looked another condition, hence please ignore that one
Here is the correct one with all the needed conditions. According to the latest one, exists also not limiting rows from the process_activity table.
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS ( SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND pi.process_instance_id = pa.process_instance_id AND pi.user_id = '137074931866340') ORDER BY pa.process_instance_id, pa.created limit 50;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Limit (cost=1079.44..1079.52 rows=32 width=24) (actual time=85.747..85.777 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Sort (cost=1079.44..1079.52 rows=32 width=24) (actual time=85.745..85.759 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Sort Key: pa.process_instance_id, pa.created
Sort Method: top-N heapsort Memory: 28kB
Buffers: shared hit=43070
-> Nested Loop (cost=1.14..1078.64 rows=32 width=24) (actual time=0.025..72.115 rows=47011 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=16) (actual time=0.010..0.015 rows=2 loops=1)
Output: pi.app_id, pi.process_instance_id
Index Cond: (c.user_id = '137074931866340'::bigint)
Filter: (c.app_id = '126502930200650'::bigint)
Buffers: shared hit=5
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1061.62 rows=1436 width=32) (actual time=0.011..20.320 rows=23506 loops=2)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_respons
e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_tr
anslate, pa.in_reply_to
Index Cond: ((m.process_instance_id = pi.process_instance_id) AND (m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=43065
Planning time: 0.455 ms
Execution time: 85.830 ms
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Limit (cost=1079.44..1079.52 rows=32 width=24) (actual time=85.747..85.777 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Sort (cost=1079.44..1079.52 rows=32 width=24) (actual time=85.745..85.759 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Sort Key: pa.process_instance_id, pa.created
Sort Method: top-N heapsort Memory: 28kB
Buffers: shared hit=43070
-> Nested Loop (cost=1.14..1078.64 rows=32 width=24) (actual time=0.025..72.115 rows=47011 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=16) (actual time=0.010..0.015 rows=2 loops=1)
Output: pi.app_id, pi.process_instance_id
Index Cond: (c.user_id = '137074931866340'::bigint)
Filter: (c.app_id = '126502930200650'::bigint)
Buffers: shared hit=5
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1061.62 rows=1436 width=32) (actual time=0.011..20.320 rows=23506 loops=2)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_respons
e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_tr
anslate, pa.in_reply_to
Index Cond: ((m.process_instance_id = pi.process_instance_id) AND (m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=43065
Planning time: 0.455 ms
Execution time: 85.830 ms
On Thu, May 7, 2020 at 11:19 PM Amarendra Konda <amar.vijaya@gmail.com> wrote:
Hi David,Thanks for the reply.This has optimized number of rows.Can you please explain, why it is getting more columns in output, even though we have asked for only one column ?EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS ( SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND pi.user_id = '137074931866340') ORDER BY pa.process_instance_id,m.created limit 50;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------
Limit (cost=1.14..37.39 rows=50 width=24) (actual time=821.283..891.629 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=274950
-> Nested Loop Semi Join (cost=1.14..266660108.78 rows=367790473 width=24) (actual time=821.282..891.607 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=274950
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..262062725.21 rows=367790473 width=32) (actual time=821.253..891.517 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_response_id,
m.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_translat
e, pa.in_reply_to
Index Cond: ((m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=274946
-> Materialize (cost=0.43..2.66 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=50)
Output: pi.app_id
Buffers: shared hit=4
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.020..0.020 rows=1 loops=1)
Output: pi.app_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=4
Planning time: 0.297 ms
Execution time: 891.686 ms
(20 rows)On Thu, May 7, 2020 at 9:17 PM David G. Johnston <david.g.johnston@gmail.com> wrote:On Thu, May 7, 2020 at 7:40 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:On 5/7/20 4:19 AM, Amarendra Konda wrote:
> Hi,
>
> PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled
> by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
>
> We have noticed huge difference interms of execution plan ( response
> time) , When we pass the direct values Vs inner query to IN clause.
>
> High level details of the use case are as follows
>
> * As part of the SQL there are 2 tables named Process_instance
> (master) and Process_activity ( child)
> * Wanted to fetch TOP 50 rows from Process_activity table for the
> given values of the Process_instance.
> * When we used Inner Join / Inner query ( query1) between parent
> table and child table , LIMIT is not really taking in to account.
> Instead it is fetching more rows and columns that required, and
> finally limiting the result
It is doing what you told it to do which is SELECT all
process_instance_i's for user_id='317079413683604' and app_id =
'427380312000560' and then filtering further. I am going to guess that
if you run the inner query alone you will find it returns ~23496 rows.
You might have better results if you an actual join between
process_activity and process_instance. Something like below(obviously
not tested):What the OP seems to want is a semi-join:(not tested)SELECT pa.process_activity_idFROM process_activity pa WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01 00:00:00'AND EXISTS (SELECT 1 FROM process_instance pi WHERE pi.app_id = pa.app_id AND pi.user_id = '317079413683604')ORDER BYpa.process_instance_id,pa.created limit 50;I'm unsure exactly how this will impact the plan choice but it should be an improvement, and in any case more correctly defines what it is you are looking for.David J.
Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
From
Amarendra Konda
Date:
Hi David,
In earlier reply, Over looked another condition, hence please ignore that one
Here is the correct one with all the needed conditions. According to the latest one, exists also not limiting rows from the process_activity table.
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS ( SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND pi.process_instance_id = pa.process_instance_id AND pi.user_id = '137074931866340') ORDER BY pa.process_instance_id, pa.created limit 50;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Limit (cost=1079.44..1079.52 rows=32 width=24) (actual time=85.747..85.777 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Sort (cost=1079.44..1079.52 rows=32 width=24) (actual time=85.745..85.759 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Sort Key: pa.process_instance_id, pa.created
Sort Method: top-N heapsort Memory: 28kB
Buffers: shared hit=43070
-> Nested Loop (cost=1.14..1078.64 rows=32 width=24) (actual time=0.025..72.115 rows=47011 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=16) (actual time=0.010..0.015 rows=2 loops=1)
Output: pi.app_id, pi.process_instance_id
Index Cond: (c.user_id = '137074931866340'::bigint)
Filter: (c.app_id = '126502930200650'::bigint)
Buffers: shared hit=5
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1061.62 rows=1436 width=32) (actual time=0.011..20.320 rows=23506 loops=2)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_respons
e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_tr
anslate, pa.in_reply_to
Index Cond: ((m.process_instance_id = pi.process_instance_id) AND (m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=43065
Planning time: 0.455 ms
Execution time: 85.830 ms
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Limit (cost=1079.44..1079.52 rows=32 width=24) (actual time=85.747..85.777 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Sort (cost=1079.44..1079.52 rows=32 width=24) (actual time=85.745..85.759 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Sort Key: pa.process_instance_id, pa.created
Sort Method: top-N heapsort Memory: 28kB
Buffers: shared hit=43070
-> Nested Loop (cost=1.14..1078.64 rows=32 width=24) (actual time=0.025..72.115 rows=47011 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=16) (actual time=0.010..0.015 rows=2 loops=1)
Output: pi.app_id, pi.process_instance_id
Index Cond: (c.user_id = '137074931866340'::bigint)
Filter: (c.app_id = '126502930200650'::bigint)
Buffers: shared hit=5
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1061.62 rows=1436 width=32) (actual time=0.011..20.320 rows=23506 loops=2)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_respons
e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_tr
anslate, pa.in_reply_to
Index Cond: ((m.process_instance_id = pi.process_instance_id) AND (m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=43065
Planning time: 0.455 ms
Execution time: 85.830 ms
On Thu, May 7, 2020 at 11:19 PM Amarendra Konda <amar.vijaya@gmail.com> wrote:
Hi David,Thanks for the reply.This has optimized number of rows.Can you please explain, why it is getting more columns in output, even though we have asked for only one column ?EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS ( SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND pi.user_id = '137074931866340') ORDER BY pa.process_instance_id,m.created limit 50;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------
Limit (cost=1.14..37.39 rows=50 width=24) (actual time=821.283..891.629 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=274950
-> Nested Loop Semi Join (cost=1.14..266660108.78 rows=367790473 width=24) (actual time=821.282..891.607 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=274950
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..262062725.21 rows=367790473 width=32) (actual time=821.253..891.517 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_response_id,
m.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_translat
e, pa.in_reply_to
Index Cond: ((m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=274946
-> Materialize (cost=0.43..2.66 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=50)
Output: pi.app_id
Buffers: shared hit=4
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.020..0.020 rows=1 loops=1)
Output: pi.app_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=4
Planning time: 0.297 ms
Execution time: 891.686 ms
(20 rows)On Thu, May 7, 2020 at 9:17 PM David G. Johnston <david.g.johnston@gmail.com> wrote:On Thu, May 7, 2020 at 7:40 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:On 5/7/20 4:19 AM, Amarendra Konda wrote:
> Hi,
>
> PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled
> by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
>
> We have noticed huge difference interms of execution plan ( response
> time) , When we pass the direct values Vs inner query to IN clause.
>
> High level details of the use case are as follows
>
> * As part of the SQL there are 2 tables named Process_instance
> (master) and Process_activity ( child)
> * Wanted to fetch TOP 50 rows from Process_activity table for the
> given values of the Process_instance.
> * When we used Inner Join / Inner query ( query1) between parent
> table and child table , LIMIT is not really taking in to account.
> Instead it is fetching more rows and columns that required, and
> finally limiting the result
It is doing what you told it to do which is SELECT all
process_instance_i's for user_id='317079413683604' and app_id =
'427380312000560' and then filtering further. I am going to guess that
if you run the inner query alone you will find it returns ~23496 rows.
You might have better results if you an actual join between
process_activity and process_instance. Something like below(obviously
not tested):What the OP seems to want is a semi-join:(not tested)SELECT pa.process_activity_idFROM process_activity pa WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01 00:00:00'AND EXISTS (SELECT 1 FROM process_instance pi WHERE pi.app_id = pa.app_id AND pi.user_id = '317079413683604')ORDER BYpa.process_instance_id,pa.created limit 50;I'm unsure exactly how this will impact the plan choice but it should be an improvement, and in any case more correctly defines what it is you are looking for.David J.
Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
From
Amarendra Konda
Date:
Hi Virendra,
Thanks for your time.
Here is the table and index structure
process_activity
Table "public.process_activity"
Column | Type | Modifiers
--------------------+-----------------------------+----------------------------
process_activity_id | bigint | not null default next_id()
process_activity_type | smallint | not null
voice_url | text |
process_activity_user_id | bigint | not null
app_id | bigint | not null
process_instance_id | bigint | not null
alias | text | not null
read_by_user | smallint | default 0
source | smallint | default 0
label_category_id | bigint |
label_id | bigint |
csat_response_id | bigint |
process_activity_fragments | jsonb |
created | timestamp without time zone | not null
updated | timestamp without time zone |
rule_id | bigint |
marketing_reply_id | bigint |
delivered_at | timestamp without time zone |
reply_fragments | jsonb |
status_fragment | jsonb |
internal_meta | jsonb |
interaction_id | text |
do_not_translate | boolean |
should_translate | integer |
in_reply_to | jsonb |
Indexes:
"process_activity_pkey" PRIMARY KEY, btree (process_activity_id)
"fki_process_activity_konotor_user_user_id" btree (process_activity_user_id) WITH (fillfactor='70')
"process_activity_process_instance_id_app_id_created_idx" btree (process_instance_id, app_id, created) WITH (fillfactor='70')
"process_activity_process_instance_id_app_id_read_by_user_created_idx" btree (process_instance_id, app_id, read_by_user, created) WITH (fillfactor='70')
"process_activity_process_instance_id_idx" btree (process_instance_id) WITH (fillfactor='70')
Table "public.process_activity"
Column | Type | Modifiers
--------------------+-----------------------------+----------------------------
process_activity_id | bigint | not null default next_id()
process_activity_type | smallint | not null
voice_url | text |
process_activity_user_id | bigint | not null
app_id | bigint | not null
process_instance_id | bigint | not null
alias | text | not null
read_by_user | smallint | default 0
source | smallint | default 0
label_category_id | bigint |
label_id | bigint |
csat_response_id | bigint |
process_activity_fragments | jsonb |
created | timestamp without time zone | not null
updated | timestamp without time zone |
rule_id | bigint |
marketing_reply_id | bigint |
delivered_at | timestamp without time zone |
reply_fragments | jsonb |
status_fragment | jsonb |
internal_meta | jsonb |
interaction_id | text |
do_not_translate | boolean |
should_translate | integer |
in_reply_to | jsonb |
Indexes:
"process_activity_pkey" PRIMARY KEY, btree (process_activity_id)
"fki_process_activity_konotor_user_user_id" btree (process_activity_user_id) WITH (fillfactor='70')
"process_activity_process_instance_id_app_id_created_idx" btree (process_instance_id, app_id, created) WITH (fillfactor='70')
"process_activity_process_instance_id_app_id_read_by_user_created_idx" btree (process_instance_id, app_id, read_by_user, created) WITH (fillfactor='70')
"process_activity_process_instance_id_idx" btree (process_instance_id) WITH (fillfactor='70')
process_instance
Table "public.process_instance"
Column | Type | Modifiers
-------------------------+-----------------------------+-----------------------------
process_instance_id | bigint | not null default next_id()
process_instance_alias | text | not null
app_id | bigint | not null
user_id | bigint | not null
Indexes:
"process_instance_pkey" PRIMARY KEY, btree (process_instance_id)
"fki_conv_konotor_user_user_id" btree (user_id) WITH (fillfactor='70')
Table "public.process_instance"
Column | Type | Modifiers
-------------------------+-----------------------------+-----------------------------
process_instance_id | bigint | not null default next_id()
process_instance_alias | text | not null
app_id | bigint | not null
user_id | bigint | not null
Indexes:
"process_instance_pkey" PRIMARY KEY, btree (process_instance_id)
"fki_conv_konotor_user_user_id" btree (user_id) WITH (fillfactor='70')
Regards, Amarendra
On Fri, May 8, 2020 at 12:01 AM Virendra Kumar <viru_7683@yahoo.com> wrote:
Sending table structure with indexes might help little further in understanding.Regards,VirendraOn Thursday, May 7, 2020, 11:08:14 AM PDT, Amarendra Konda <amar.vijaya@gmail.com> wrote:Hi David,In earlier reply, Over looked another condition, hence please ignore that oneHere is the correct one with all the needed conditions. According to the latest one, exists also not limiting rows from the process_activity table.EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS ( SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND pi.process_instance_id = pa.process_instance_id AND pi.user_id = '137074931866340') ORDER BY pa.process_instance_id, pa.created limit 50;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Limit (cost=1079.44..1079.52 rows=32 width=24) (actual time=85.747..85.777 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Sort (cost=1079.44..1079.52 rows=32 width=24) (actual time=85.745..85.759 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Sort Key: pa.process_instance_id, pa.created
Sort Method: top-N heapsort Memory: 28kB
Buffers: shared hit=43070
-> Nested Loop (cost=1.14..1078.64 rows=32 width=24) (actual time=0.025..72.115 rows=47011 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=16) (actual time=0.010..0.015 rows=2 loops=1)
Output: pi.app_id, pi.process_instance_id
Index Cond: (c.user_id = '137074931866340'::bigint)
Filter: (c.app_id = '126502930200650'::bigint)
Buffers: shared hit=5
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1061.62 rows=1436 width=32) (actual time=0.011..20.320 rows=23506 loops=2)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_respons
e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_tr
anslate, pa.in_reply_to
Index Cond: ((m.process_instance_id = pi.process_instance_id) AND (m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=43065
Planning time: 0.455 ms
Execution time: 85.830 msOn Thu, May 7, 2020 at 11:19 PM Amarendra Konda <amar.vijaya@gmail.com> wrote:Hi David,Thanks for the reply.This has optimized number of rows.Can you please explain, why it is getting more columns in output, even though we have asked for only one column ?EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS ( SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND pi.user_id = '137074931866340') ORDER BY pa.process_instance_id,m.created limit 50;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------
Limit (cost=1.14..37.39 rows=50 width=24) (actual time=821.283..891.629 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=274950
-> Nested Loop Semi Join (cost=1.14..266660108.78 rows=367790473 width=24) (actual time=821.282..891.607 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=274950
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..262062725.21 rows=367790473 width=32) (actual time=821.253..891.517 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_response_id,
m.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_translat
e, pa.in_reply_to
Index Cond: ((m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=274946
-> Materialize (cost=0.43..2.66 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=50)
Output: pi.app_id
Buffers: shared hit=4
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.020..0.020 rows=1 loops=1)
Output: pi.app_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=4
Planning time: 0.297 ms
Execution time: 891.686 ms
(20 rows)On Thu, May 7, 2020 at 9:17 PM David G. Johnston <david.g.johnston@gmail.com> wrote:On Thu, May 7, 2020 at 7:40 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:On 5/7/20 4:19 AM, Amarendra Konda wrote:
> Hi,
>
> PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled
> by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
>
> We have noticed huge difference interms of execution plan ( response
> time) , When we pass the direct values Vs inner query to IN clause.
>
> High level details of the use case are as follows
>
> * As part of the SQL there are 2 tables named Process_instance
> (master) and Process_activity ( child)
> * Wanted to fetch TOP 50 rows from Process_activity table for the
> given values of the Process_instance.
> * When we used Inner Join / Inner query ( query1) between parent
> table and child table , LIMIT is not really taking in to account.
> Instead it is fetching more rows and columns that required, and
> finally limiting the result
It is doing what you told it to do which is SELECT all
process_instance_i's for user_id='317079413683604' and app_id =
'427380312000560' and then filtering further. I am going to guess that
if you run the inner query alone you will find it returns ~23496 rows.
You might have better results if you an actual join between
process_activity and process_instance. Something like below(obviously
not tested):What the OP seems to want is a semi-join:(not tested)SELECT pa.process_activity_idFROM process_activity pa WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01 00:00:00'AND EXISTS (SELECT 1 FROM process_instance pi WHERE pi.app_id = pa.app_id AND pi.user_id = '317079413683604')ORDER BYpa.process_instance_id,pa.created limit 50;I'm unsure exactly how this will impact the plan choice but it should be an improvement, and in any case more correctly defines what it is you are looking for.David J.
Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
From
Amarendra Konda
Date:
Hi Virendra,
Thanks for your time.
Here is the table and index structure
process_activity
Table "public.process_activity"
Column | Type | Modifiers
--------------------+-----------------------------+----------------------------
process_activity_id | bigint | not null default next_id()
process_activity_type | smallint | not null
voice_url | text |
process_activity_user_id | bigint | not null
app_id | bigint | not null
process_instance_id | bigint | not null
alias | text | not null
read_by_user | smallint | default 0
source | smallint | default 0
label_category_id | bigint |
label_id | bigint |
csat_response_id | bigint |
process_activity_fragments | jsonb |
created | timestamp without time zone | not null
updated | timestamp without time zone |
rule_id | bigint |
marketing_reply_id | bigint |
delivered_at | timestamp without time zone |
reply_fragments | jsonb |
status_fragment | jsonb |
internal_meta | jsonb |
interaction_id | text |
do_not_translate | boolean |
should_translate | integer |
in_reply_to | jsonb |
Indexes:
"process_activity_pkey" PRIMARY KEY, btree (process_activity_id)
"fki_process_activity_konotor_user_user_id" btree (process_activity_user_id) WITH (fillfactor='70')
"process_activity_process_instance_id_app_id_created_idx" btree (process_instance_id, app_id, created) WITH (fillfactor='70')
"process_activity_process_instance_id_app_id_read_by_user_created_idx" btree (process_instance_id, app_id, read_by_user, created) WITH (fillfactor='70')
"process_activity_process_instance_id_idx" btree (process_instance_id) WITH (fillfactor='70')
Table "public.process_activity"
Column | Type | Modifiers
--------------------+-----------------------------+----------------------------
process_activity_id | bigint | not null default next_id()
process_activity_type | smallint | not null
voice_url | text |
process_activity_user_id | bigint | not null
app_id | bigint | not null
process_instance_id | bigint | not null
alias | text | not null
read_by_user | smallint | default 0
source | smallint | default 0
label_category_id | bigint |
label_id | bigint |
csat_response_id | bigint |
process_activity_fragments | jsonb |
created | timestamp without time zone | not null
updated | timestamp without time zone |
rule_id | bigint |
marketing_reply_id | bigint |
delivered_at | timestamp without time zone |
reply_fragments | jsonb |
status_fragment | jsonb |
internal_meta | jsonb |
interaction_id | text |
do_not_translate | boolean |
should_translate | integer |
in_reply_to | jsonb |
Indexes:
"process_activity_pkey" PRIMARY KEY, btree (process_activity_id)
"fki_process_activity_konotor_user_user_id" btree (process_activity_user_id) WITH (fillfactor='70')
"process_activity_process_instance_id_app_id_created_idx" btree (process_instance_id, app_id, created) WITH (fillfactor='70')
"process_activity_process_instance_id_app_id_read_by_user_created_idx" btree (process_instance_id, app_id, read_by_user, created) WITH (fillfactor='70')
"process_activity_process_instance_id_idx" btree (process_instance_id) WITH (fillfactor='70')
process_instance
Table "public.process_instance"
Column | Type | Modifiers
-------------------------+-----------------------------+-----------------------------
process_instance_id | bigint | not null default next_id()
process_instance_alias | text | not null
app_id | bigint | not null
user_id | bigint | not null
Indexes:
"process_instance_pkey" PRIMARY KEY, btree (process_instance_id)
"fki_conv_konotor_user_user_id" btree (user_id) WITH (fillfactor='70')
Table "public.process_instance"
Column | Type | Modifiers
-------------------------+-----------------------------+-----------------------------
process_instance_id | bigint | not null default next_id()
process_instance_alias | text | not null
app_id | bigint | not null
user_id | bigint | not null
Indexes:
"process_instance_pkey" PRIMARY KEY, btree (process_instance_id)
"fki_conv_konotor_user_user_id" btree (user_id) WITH (fillfactor='70')
Regards, Amarendra
On Fri, May 8, 2020 at 12:01 AM Virendra Kumar <viru_7683@yahoo.com> wrote:
Sending table structure with indexes might help little further in understanding.Regards,VirendraOn Thursday, May 7, 2020, 11:08:14 AM PDT, Amarendra Konda <amar.vijaya@gmail.com> wrote:Hi David,In earlier reply, Over looked another condition, hence please ignore that oneHere is the correct one with all the needed conditions. According to the latest one, exists also not limiting rows from the process_activity table.EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS ( SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND pi.process_instance_id = pa.process_instance_id AND pi.user_id = '137074931866340') ORDER BY pa.process_instance_id, pa.created limit 50;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Limit (cost=1079.44..1079.52 rows=32 width=24) (actual time=85.747..85.777 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Sort (cost=1079.44..1079.52 rows=32 width=24) (actual time=85.745..85.759 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Sort Key: pa.process_instance_id, pa.created
Sort Method: top-N heapsort Memory: 28kB
Buffers: shared hit=43070
-> Nested Loop (cost=1.14..1078.64 rows=32 width=24) (actual time=0.025..72.115 rows=47011 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=16) (actual time=0.010..0.015 rows=2 loops=1)
Output: pi.app_id, pi.process_instance_id
Index Cond: (c.user_id = '137074931866340'::bigint)
Filter: (c.app_id = '126502930200650'::bigint)
Buffers: shared hit=5
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1061.62 rows=1436 width=32) (actual time=0.011..20.320 rows=23506 loops=2)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_respons
e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_tr
anslate, pa.in_reply_to
Index Cond: ((m.process_instance_id = pi.process_instance_id) AND (m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=43065
Planning time: 0.455 ms
Execution time: 85.830 msOn Thu, May 7, 2020 at 11:19 PM Amarendra Konda <amar.vijaya@gmail.com> wrote:Hi David,Thanks for the reply.This has optimized number of rows.Can you please explain, why it is getting more columns in output, even though we have asked for only one column ?EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS ( SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND pi.user_id = '137074931866340') ORDER BY pa.process_instance_id,m.created limit 50;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------
Limit (cost=1.14..37.39 rows=50 width=24) (actual time=821.283..891.629 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=274950
-> Nested Loop Semi Join (cost=1.14..266660108.78 rows=367790473 width=24) (actual time=821.282..891.607 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=274950
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..262062725.21 rows=367790473 width=32) (actual time=821.253..891.517 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_response_id,
m.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_translat
e, pa.in_reply_to
Index Cond: ((m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=274946
-> Materialize (cost=0.43..2.66 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=50)
Output: pi.app_id
Buffers: shared hit=4
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.020..0.020 rows=1 loops=1)
Output: pi.app_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=4
Planning time: 0.297 ms
Execution time: 891.686 ms
(20 rows)On Thu, May 7, 2020 at 9:17 PM David G. Johnston <david.g.johnston@gmail.com> wrote:On Thu, May 7, 2020 at 7:40 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:On 5/7/20 4:19 AM, Amarendra Konda wrote:
> Hi,
>
> PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled
> by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
>
> We have noticed huge difference interms of execution plan ( response
> time) , When we pass the direct values Vs inner query to IN clause.
>
> High level details of the use case are as follows
>
> * As part of the SQL there are 2 tables named Process_instance
> (master) and Process_activity ( child)
> * Wanted to fetch TOP 50 rows from Process_activity table for the
> given values of the Process_instance.
> * When we used Inner Join / Inner query ( query1) between parent
> table and child table , LIMIT is not really taking in to account.
> Instead it is fetching more rows and columns that required, and
> finally limiting the result
It is doing what you told it to do which is SELECT all
process_instance_i's for user_id='317079413683604' and app_id =
'427380312000560' and then filtering further. I am going to guess that
if you run the inner query alone you will find it returns ~23496 rows.
You might have better results if you an actual join between
process_activity and process_instance. Something like below(obviously
not tested):What the OP seems to want is a semi-join:(not tested)SELECT pa.process_activity_idFROM process_activity pa WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01 00:00:00'AND EXISTS (SELECT 1 FROM process_instance pi WHERE pi.app_id = pa.app_id AND pi.user_id = '317079413683604')ORDER BYpa.process_instance_id,pa.created limit 50;I'm unsure exactly how this will impact the plan choice but it should be an improvement, and in any case more correctly defines what it is you are looking for.David J.
Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
From
Adrian Klaver
Date:
On 5/7/20 10:49 AM, Amarendra Konda wrote: > Hi David, > > Thanks for the reply.This has optimized number of rows. Yeah, but your execution time has increased an order of magnitude. Not sure if that is what you want. > > Can you please explain, why it is getting more columns in output, even > though we have asked for only one column ? > > > EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT > pa.process_activity_id AS pa_process_activity_id FROM process_activity > pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 > 00:00:00' AND EXISTS ( SELECT 1 FROM process_instance pi where > pi.app_id = pa.app_id AND pi.user_id = '137074931866340') ORDER BY > pa.process_instance_id,m.created limit 50; > > QUERY PLAN > > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > ----------------- > Limit (cost=1.14..37.39 rows=50 width=24) (actual > time=821.283..891.629 rows=50 loops=1) > Output: pa.process_activity_id, pa.process_instance_id, pa.created > Buffers: shared hit=274950 > -> Nested Loop Semi Join (cost=1.14..266660108.78 rows=367790473 > width=24) (actual time=821.282..891.607 rows=50 loops=1) > Output: pa.process_activity_id, pa.process_instance_id, pa.created > Buffers: shared hit=274950 > -> Index Scan using > process_activity_process_instance_id_app_id_created_idx on > public.process_activity pa (cost=0.70..262062725.21 rows=367790473 > width=32) (actual time=821.253..891.517 rows=50 loops=1) > * Output: pa.process_activity_id, pa.process_activity_type, > pa.voice_url, pa.process_activity_user_id, pa.app_id, > pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, > pa.label_category_id, pa.label_id, pa.csat_response_id, > m.process_activity_fragments, pa.created, pa.updated, pa.rule_id, > pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, > pa.status_fragment, pa.internal_meta, pa.interaction_id, > pa.do_not_translate, pa.should_translat > e, pa.in_reply_to* > Index Cond: ((m.app_id = '126502930200650'::bigint) AND > (m.created > '1970-01-01 00:00:00'::timestamp without time zone)) > Buffers: shared hit=274946 > -> Materialize (cost=0.43..2.66 rows=1 width=8) (actual > time=0.001..0.001 rows=1 loops=50) > Output: pi.app_id > Buffers: shared hit=4 > -> Index Scan using fki_conv_konotor_user_user_id on > public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual > time=0.020..0.020 rows=1 loops=1) > Output: pi.app_id > Index Cond: (pi.user_id = '137074931866340'::bigint) > Filter: (pi.app_id = '126502930200650'::bigint) > Buffers: shared hit=4 > Planning time: 0.297 ms > Execution time: 891.686 ms > (20 rows) > > On Thu, May 7, 2020 at 9:17 PM David G. Johnston > <david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com>> wrote: > > On Thu, May 7, 2020 at 7:40 AM Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: > > On 5/7/20 4:19 AM, Amarendra Konda wrote: > > Hi, > > > > PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, > compiled > > by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit > > > > We have noticed huge difference interms of execution plan ( > response > > time) , When we pass the direct values Vs inner query to IN > clause. > > > > High level details of the use case are as follows > > > > * As part of the SQL there are 2 tables named Process_instance > > (master) and Process_activity ( child) > > * Wanted to fetch TOP 50 rows from Process_activity table > for the > > given values of the Process_instance. > > * When we used Inner Join / Inner query ( query1) between > parent > > table and child table , LIMIT is not really taking in to > account. > > Instead it is fetching more rows and columns that > required, and > > finally limiting the result > > It is doing what you told it to do which is SELECT all > process_instance_i's for user_id='317079413683604' and app_id = > '427380312000560' and then filtering further. I am going to > guess that > if you run the inner query alone you will find it returns ~23496 > rows. > You might have better results if you an actual join between > process_activity and process_instance. Something like > below(obviously > not tested): > > > What the OP seems to want is a semi-join: > > (not tested) > > SELECT pa.process_activity_id > FROM process_activity pa WHERE pa.app_id = '427380312000560' AND > pa.created > '1970-01-01 00:00:00' > ANDEXISTS ( > SELECT 1 FROM process_instance pi WHERE pi.app_id = pa.app_id AND > pi.user_id = '317079413683604' > ) > ORDER BY > pa.process_instance_id, > pa.created limit 50; > > I'm unsure exactly how this will impact the plan choice but it > should be an improvement, and in any case more correctly defines > what it is you are looking for. > > David J. > -- Adrian Klaver adrian.klaver@aklaver.com
Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
From
Adrian Klaver
Date:
On 5/7/20 10:49 AM, Amarendra Konda wrote: > Hi David, > > Thanks for the reply.This has optimized number of rows. Yeah, but your execution time has increased an order of magnitude. Not sure if that is what you want. > > Can you please explain, why it is getting more columns in output, even > though we have asked for only one column ? > > > EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT > pa.process_activity_id AS pa_process_activity_id FROM process_activity > pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 > 00:00:00' AND EXISTS ( SELECT 1 FROM process_instance pi where > pi.app_id = pa.app_id AND pi.user_id = '137074931866340') ORDER BY > pa.process_instance_id,m.created limit 50; > > QUERY PLAN > > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > ----------------- > Limit (cost=1.14..37.39 rows=50 width=24) (actual > time=821.283..891.629 rows=50 loops=1) > Output: pa.process_activity_id, pa.process_instance_id, pa.created > Buffers: shared hit=274950 > -> Nested Loop Semi Join (cost=1.14..266660108.78 rows=367790473 > width=24) (actual time=821.282..891.607 rows=50 loops=1) > Output: pa.process_activity_id, pa.process_instance_id, pa.created > Buffers: shared hit=274950 > -> Index Scan using > process_activity_process_instance_id_app_id_created_idx on > public.process_activity pa (cost=0.70..262062725.21 rows=367790473 > width=32) (actual time=821.253..891.517 rows=50 loops=1) > * Output: pa.process_activity_id, pa.process_activity_type, > pa.voice_url, pa.process_activity_user_id, pa.app_id, > pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, > pa.label_category_id, pa.label_id, pa.csat_response_id, > m.process_activity_fragments, pa.created, pa.updated, pa.rule_id, > pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, > pa.status_fragment, pa.internal_meta, pa.interaction_id, > pa.do_not_translate, pa.should_translat > e, pa.in_reply_to* > Index Cond: ((m.app_id = '126502930200650'::bigint) AND > (m.created > '1970-01-01 00:00:00'::timestamp without time zone)) > Buffers: shared hit=274946 > -> Materialize (cost=0.43..2.66 rows=1 width=8) (actual > time=0.001..0.001 rows=1 loops=50) > Output: pi.app_id > Buffers: shared hit=4 > -> Index Scan using fki_conv_konotor_user_user_id on > public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual > time=0.020..0.020 rows=1 loops=1) > Output: pi.app_id > Index Cond: (pi.user_id = '137074931866340'::bigint) > Filter: (pi.app_id = '126502930200650'::bigint) > Buffers: shared hit=4 > Planning time: 0.297 ms > Execution time: 891.686 ms > (20 rows) > > On Thu, May 7, 2020 at 9:17 PM David G. Johnston > <david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com>> wrote: > > On Thu, May 7, 2020 at 7:40 AM Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: > > On 5/7/20 4:19 AM, Amarendra Konda wrote: > > Hi, > > > > PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, > compiled > > by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit > > > > We have noticed huge difference interms of execution plan ( > response > > time) , When we pass the direct values Vs inner query to IN > clause. > > > > High level details of the use case are as follows > > > > * As part of the SQL there are 2 tables named Process_instance > > (master) and Process_activity ( child) > > * Wanted to fetch TOP 50 rows from Process_activity table > for the > > given values of the Process_instance. > > * When we used Inner Join / Inner query ( query1) between > parent > > table and child table , LIMIT is not really taking in to > account. > > Instead it is fetching more rows and columns that > required, and > > finally limiting the result > > It is doing what you told it to do which is SELECT all > process_instance_i's for user_id='317079413683604' and app_id = > '427380312000560' and then filtering further. I am going to > guess that > if you run the inner query alone you will find it returns ~23496 > rows. > You might have better results if you an actual join between > process_activity and process_instance. Something like > below(obviously > not tested): > > > What the OP seems to want is a semi-join: > > (not tested) > > SELECT pa.process_activity_id > FROM process_activity pa WHERE pa.app_id = '427380312000560' AND > pa.created > '1970-01-01 00:00:00' > ANDEXISTS ( > SELECT 1 FROM process_instance pi WHERE pi.app_id = pa.app_id AND > pi.user_id = '317079413683604' > ) > ORDER BY > pa.process_instance_id, > pa.created limit 50; > > I'm unsure exactly how this will impact the plan choice but it > should be an improvement, and in any case more correctly defines > what it is you are looking for. > > David J. > -- Adrian Klaver adrian.klaver@aklaver.com
Re: Explain plan changes - IN CLAUSE ( Passing direct values VsINNER Query )
From
Virendra Kumar
Date:
Here is my thought on why row is not limiting when joined vs why it is limiting when not joined.
When not joined and where clause is having IN, it is using index process_activity_process_instance_id_app_id_created_idx which has columns process_instance_id, created which is in order by and hence no additional ordering is required and a direct rows limit can be applied here.
When in join condition it has to fetch rows according to filter clause, join them and then order ( sort node in plan) hence it cannot limit rows while fetching it first time from the table.
You are also missing pi.user_id = '317079413683604' in exists clause. It is worth trying to put there and run explain again and see where it takes. But to your point row limitation cannot happen in case of join as such in the query.
Regards,
Virendra
On Thursday, May 7, 2020, 11:52:00 AM PDT, Amarendra Konda <amar.vijaya@gmail.com> wrote:
Hi Virendra,
Thanks for your time.
Here is the table and index structure
process_activity
Table "public.process_activity"
Column | Type | Modifiers
--------------------+-----------------------------+----------------------------
process_activity_id | bigint | not null default next_id()
process_activity_type | smallint | not null
voice_url | text |
process_activity_user_id | bigint | not null
app_id | bigint | not null
process_instance_id | bigint | not null
alias | text | not null
read_by_user | smallint | default 0
source | smallint | default 0
label_category_id | bigint |
label_id | bigint |
csat_response_id | bigint |
process_activity_fragments | jsonb |
created | timestamp without time zone | not null
updated | timestamp without time zone |
rule_id | bigint |
marketing_reply_id | bigint |
delivered_at | timestamp without time zone |
reply_fragments | jsonb |
status_fragment | jsonb |
internal_meta | jsonb |
interaction_id | text |
do_not_translate | boolean |
should_translate | integer |
in_reply_to | jsonb |
Indexes:
"process_activity_pkey" PRIMARY KEY, btree (process_activity_id)
"fki_process_activity_konotor_user_user_id" btree (process_activity_user_id) WITH (fillfactor='70')
"process_activity_process_instance_id_app_id_created_idx" btree (process_instance_id, app_id, created) WITH (fillfactor='70')
"process_activity_process_instance_id_app_id_read_by_user_created_idx" btree (process_instance_id, app_id, read_by_user, created) WITH (fillfactor='70')
"process_activity_process_instance_id_idx" btree (process_instance_id) WITH (fillfactor='70')
Table "public.process_activity"
Column | Type | Modifiers
--------------------+-----------------------------+----------------------------
process_activity_id | bigint | not null default next_id()
process_activity_type | smallint | not null
voice_url | text |
process_activity_user_id | bigint | not null
app_id | bigint | not null
process_instance_id | bigint | not null
alias | text | not null
read_by_user | smallint | default 0
source | smallint | default 0
label_category_id | bigint |
label_id | bigint |
csat_response_id | bigint |
process_activity_fragments | jsonb |
created | timestamp without time zone | not null
updated | timestamp without time zone |
rule_id | bigint |
marketing_reply_id | bigint |
delivered_at | timestamp without time zone |
reply_fragments | jsonb |
status_fragment | jsonb |
internal_meta | jsonb |
interaction_id | text |
do_not_translate | boolean |
should_translate | integer |
in_reply_to | jsonb |
Indexes:
"process_activity_pkey" PRIMARY KEY, btree (process_activity_id)
"fki_process_activity_konotor_user_user_id" btree (process_activity_user_id) WITH (fillfactor='70')
"process_activity_process_instance_id_app_id_created_idx" btree (process_instance_id, app_id, created) WITH (fillfactor='70')
"process_activity_process_instance_id_app_id_read_by_user_created_idx" btree (process_instance_id, app_id, read_by_user, created) WITH (fillfactor='70')
"process_activity_process_instance_id_idx" btree (process_instance_id) WITH (fillfactor='70')
process_instance
Table "public.process_instance"
Column | Type | Modifiers
-------------------------+-----------------------------+-----------------------------
process_instance_id | bigint | not null default next_id()
process_instance_alias | text | not null
app_id | bigint | not null
user_id | bigint | not null
Indexes:
"process_instance_pkey" PRIMARY KEY, btree (process_instance_id)
"fki_conv_konotor_user_user_id" btree (user_id) WITH (fillfactor='70')
Table "public.process_instance"
Column | Type | Modifiers
-------------------------+-----------------------------+-----------------------------
process_instance_id | bigint | not null default next_id()
process_instance_alias | text | not null
app_id | bigint | not null
user_id | bigint | not null
Indexes:
"process_instance_pkey" PRIMARY KEY, btree (process_instance_id)
"fki_conv_konotor_user_user_id" btree (user_id) WITH (fillfactor='70')
Regards, Amarendra
On Fri, May 8, 2020 at 12:01 AM Virendra Kumar <viru_7683@yahoo.com> wrote:
Sending table structure with indexes might help little further in understanding.Regards,VirendraOn Thursday, May 7, 2020, 11:08:14 AM PDT, Amarendra Konda <amar.vijaya@gmail.com> wrote:Hi David,In earlier reply, Over looked another condition, hence please ignore that oneHere is the correct one with all the needed conditions. According to the latest one, exists also not limiting rows from the process_activity table.EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS ( SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND pi.process_instance_id = pa.process_instance_id AND pi.user_id = '137074931866340') ORDER BY pa.process_instance_id, pa.created limit 50;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Limit (cost=1079.44..1079.52 rows=32 width=24) (actual time=85.747..85.777 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Sort (cost=1079.44..1079.52 rows=32 width=24) (actual time=85.745..85.759 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Sort Key: pa.process_instance_id, pa.created
Sort Method: top-N heapsort Memory: 28kB
Buffers: shared hit=43070
-> Nested Loop (cost=1.14..1078.64 rows=32 width=24) (actual time=0.025..72.115 rows=47011 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=16) (actual time=0.010..0.015 rows=2 loops=1)
Output: pi.app_id, pi.process_instance_id
Index Cond: (c.user_id = '137074931866340'::bigint)
Filter: (c.app_id = '126502930200650'::bigint)
Buffers: shared hit=5
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1061.62 rows=1436 width=32) (actual time=0.011..20.320 rows=23506 loops=2)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_respons
e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_tr
anslate, pa.in_reply_to
Index Cond: ((m.process_instance_id = pi.process_instance_id) AND (m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=43065
Planning time: 0.455 ms
Execution time: 85.830 msOn Thu, May 7, 2020 at 11:19 PM Amarendra Konda <amar.vijaya@gmail.com> wrote:Hi David,Thanks for the reply.This has optimized number of rows.Can you please explain, why it is getting more columns in output, even though we have asked for only one column ?EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS ( SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND pi.user_id = '137074931866340') ORDER BY pa.process_instance_id,m.created limit 50;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------
Limit (cost=1.14..37.39 rows=50 width=24) (actual time=821.283..891.629 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=274950
-> Nested Loop Semi Join (cost=1.14..266660108.78 rows=367790473 width=24) (actual time=821.282..891.607 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=274950
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..262062725.21 rows=367790473 width=32) (actual time=821.253..891.517 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_response_id,
m.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_translat
e, pa.in_reply_to
Index Cond: ((m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=274946
-> Materialize (cost=0.43..2.66 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=50)
Output: pi.app_id
Buffers: shared hit=4
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.020..0.020 rows=1 loops=1)
Output: pi.app_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=4
Planning time: 0.297 ms
Execution time: 891.686 ms
(20 rows)On Thu, May 7, 2020 at 9:17 PM David G. Johnston <david.g.johnston@gmail.com> wrote:On Thu, May 7, 2020 at 7:40 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:On 5/7/20 4:19 AM, Amarendra Konda wrote:
> Hi,
>
> PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled
> by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
>
> We have noticed huge difference interms of execution plan ( response
> time) , When we pass the direct values Vs inner query to IN clause.
>
> High level details of the use case are as follows
>
> * As part of the SQL there are 2 tables named Process_instance
> (master) and Process_activity ( child)
> * Wanted to fetch TOP 50 rows from Process_activity table for the
> given values of the Process_instance.
> * When we used Inner Join / Inner query ( query1) between parent
> table and child table , LIMIT is not really taking in to account.
> Instead it is fetching more rows and columns that required, and
> finally limiting the result
It is doing what you told it to do which is SELECT all
process_instance_i's for user_id='317079413683604' and app_id =
'427380312000560' and then filtering further. I am going to guess that
if you run the inner query alone you will find it returns ~23496 rows.
You might have better results if you an actual join between
process_activity and process_instance. Something like below(obviously
not tested):What the OP seems to want is a semi-join:(not tested)SELECT pa.process_activity_idFROM process_activity pa WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01 00:00:00'AND EXISTS (SELECT 1 FROM process_instance pi WHERE pi.app_id = pa.app_id AND pi.user_id = '317079413683604')ORDER BYpa.process_instance_id,pa.created limit 50;I'm unsure exactly how this will impact the plan choice but it should be an improvement, and in any case more correctly defines what it is you are looking for.David J.
Re: Explain plan changes - IN CLAUSE ( Passing direct values VsINNER Query )
From
Virendra Kumar
Date:
Here is my thought on why row is not limiting when joined vs why it is limiting when not joined.
When not joined and where clause is having IN, it is using index process_activity_process_instance_id_app_id_created_idx which has columns process_instance_id, created which is in order by and hence no additional ordering is required and a direct rows limit can be applied here.
When in join condition it has to fetch rows according to filter clause, join them and then order ( sort node in plan) hence it cannot limit rows while fetching it first time from the table.
You are also missing pi.user_id = '317079413683604' in exists clause. It is worth trying to put there and run explain again and see where it takes. But to your point row limitation cannot happen in case of join as such in the query.
Regards,
Virendra
On Thursday, May 7, 2020, 11:52:00 AM PDT, Amarendra Konda <amar.vijaya@gmail.com> wrote:
Hi Virendra,
Thanks for your time.
Here is the table and index structure
process_activity
Table "public.process_activity"
Column | Type | Modifiers
--------------------+-----------------------------+----------------------------
process_activity_id | bigint | not null default next_id()
process_activity_type | smallint | not null
voice_url | text |
process_activity_user_id | bigint | not null
app_id | bigint | not null
process_instance_id | bigint | not null
alias | text | not null
read_by_user | smallint | default 0
source | smallint | default 0
label_category_id | bigint |
label_id | bigint |
csat_response_id | bigint |
process_activity_fragments | jsonb |
created | timestamp without time zone | not null
updated | timestamp without time zone |
rule_id | bigint |
marketing_reply_id | bigint |
delivered_at | timestamp without time zone |
reply_fragments | jsonb |
status_fragment | jsonb |
internal_meta | jsonb |
interaction_id | text |
do_not_translate | boolean |
should_translate | integer |
in_reply_to | jsonb |
Indexes:
"process_activity_pkey" PRIMARY KEY, btree (process_activity_id)
"fki_process_activity_konotor_user_user_id" btree (process_activity_user_id) WITH (fillfactor='70')
"process_activity_process_instance_id_app_id_created_idx" btree (process_instance_id, app_id, created) WITH (fillfactor='70')
"process_activity_process_instance_id_app_id_read_by_user_created_idx" btree (process_instance_id, app_id, read_by_user, created) WITH (fillfactor='70')
"process_activity_process_instance_id_idx" btree (process_instance_id) WITH (fillfactor='70')
Table "public.process_activity"
Column | Type | Modifiers
--------------------+-----------------------------+----------------------------
process_activity_id | bigint | not null default next_id()
process_activity_type | smallint | not null
voice_url | text |
process_activity_user_id | bigint | not null
app_id | bigint | not null
process_instance_id | bigint | not null
alias | text | not null
read_by_user | smallint | default 0
source | smallint | default 0
label_category_id | bigint |
label_id | bigint |
csat_response_id | bigint |
process_activity_fragments | jsonb |
created | timestamp without time zone | not null
updated | timestamp without time zone |
rule_id | bigint |
marketing_reply_id | bigint |
delivered_at | timestamp without time zone |
reply_fragments | jsonb |
status_fragment | jsonb |
internal_meta | jsonb |
interaction_id | text |
do_not_translate | boolean |
should_translate | integer |
in_reply_to | jsonb |
Indexes:
"process_activity_pkey" PRIMARY KEY, btree (process_activity_id)
"fki_process_activity_konotor_user_user_id" btree (process_activity_user_id) WITH (fillfactor='70')
"process_activity_process_instance_id_app_id_created_idx" btree (process_instance_id, app_id, created) WITH (fillfactor='70')
"process_activity_process_instance_id_app_id_read_by_user_created_idx" btree (process_instance_id, app_id, read_by_user, created) WITH (fillfactor='70')
"process_activity_process_instance_id_idx" btree (process_instance_id) WITH (fillfactor='70')
process_instance
Table "public.process_instance"
Column | Type | Modifiers
-------------------------+-----------------------------+-----------------------------
process_instance_id | bigint | not null default next_id()
process_instance_alias | text | not null
app_id | bigint | not null
user_id | bigint | not null
Indexes:
"process_instance_pkey" PRIMARY KEY, btree (process_instance_id)
"fki_conv_konotor_user_user_id" btree (user_id) WITH (fillfactor='70')
Table "public.process_instance"
Column | Type | Modifiers
-------------------------+-----------------------------+-----------------------------
process_instance_id | bigint | not null default next_id()
process_instance_alias | text | not null
app_id | bigint | not null
user_id | bigint | not null
Indexes:
"process_instance_pkey" PRIMARY KEY, btree (process_instance_id)
"fki_conv_konotor_user_user_id" btree (user_id) WITH (fillfactor='70')
Regards, Amarendra
On Fri, May 8, 2020 at 12:01 AM Virendra Kumar <viru_7683@yahoo.com> wrote:
Sending table structure with indexes might help little further in understanding.Regards,VirendraOn Thursday, May 7, 2020, 11:08:14 AM PDT, Amarendra Konda <amar.vijaya@gmail.com> wrote:Hi David,In earlier reply, Over looked another condition, hence please ignore that oneHere is the correct one with all the needed conditions. According to the latest one, exists also not limiting rows from the process_activity table.EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS ( SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND pi.process_instance_id = pa.process_instance_id AND pi.user_id = '137074931866340') ORDER BY pa.process_instance_id, pa.created limit 50;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Limit (cost=1079.44..1079.52 rows=32 width=24) (actual time=85.747..85.777 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Sort (cost=1079.44..1079.52 rows=32 width=24) (actual time=85.745..85.759 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Sort Key: pa.process_instance_id, pa.created
Sort Method: top-N heapsort Memory: 28kB
Buffers: shared hit=43070
-> Nested Loop (cost=1.14..1078.64 rows=32 width=24) (actual time=0.025..72.115 rows=47011 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=43070
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=16) (actual time=0.010..0.015 rows=2 loops=1)
Output: pi.app_id, pi.process_instance_id
Index Cond: (c.user_id = '137074931866340'::bigint)
Filter: (c.app_id = '126502930200650'::bigint)
Buffers: shared hit=5
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1061.62 rows=1436 width=32) (actual time=0.011..20.320 rows=23506 loops=2)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_respons
e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_tr
anslate, pa.in_reply_to
Index Cond: ((m.process_instance_id = pi.process_instance_id) AND (m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=43065
Planning time: 0.455 ms
Execution time: 85.830 msOn Thu, May 7, 2020 at 11:19 PM Amarendra Konda <amar.vijaya@gmail.com> wrote:Hi David,Thanks for the reply.This has optimized number of rows.Can you please explain, why it is getting more columns in output, even though we have asked for only one column ?EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS ( SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND pi.user_id = '137074931866340') ORDER BY pa.process_instance_id,m.created limit 50;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------
Limit (cost=1.14..37.39 rows=50 width=24) (actual time=821.283..891.629 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=274950
-> Nested Loop Semi Join (cost=1.14..266660108.78 rows=367790473 width=24) (actual time=821.282..891.607 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_instance_id, pa.created
Buffers: shared hit=274950
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..262062725.21 rows=367790473 width=32) (actual time=821.253..891.517 rows=50 loops=1)
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_response_id,
m.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_translat
e, pa.in_reply_to
Index Cond: ((m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=274946
-> Materialize (cost=0.43..2.66 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=50)
Output: pi.app_id
Buffers: shared hit=4
-> Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual time=0.020..0.020 rows=1 loops=1)
Output: pi.app_id
Index Cond: (pi.user_id = '137074931866340'::bigint)
Filter: (pi.app_id = '126502930200650'::bigint)
Buffers: shared hit=4
Planning time: 0.297 ms
Execution time: 891.686 ms
(20 rows)On Thu, May 7, 2020 at 9:17 PM David G. Johnston <david.g.johnston@gmail.com> wrote:On Thu, May 7, 2020 at 7:40 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:On 5/7/20 4:19 AM, Amarendra Konda wrote:
> Hi,
>
> PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled
> by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
>
> We have noticed huge difference interms of execution plan ( response
> time) , When we pass the direct values Vs inner query to IN clause.
>
> High level details of the use case are as follows
>
> * As part of the SQL there are 2 tables named Process_instance
> (master) and Process_activity ( child)
> * Wanted to fetch TOP 50 rows from Process_activity table for the
> given values of the Process_instance.
> * When we used Inner Join / Inner query ( query1) between parent
> table and child table , LIMIT is not really taking in to account.
> Instead it is fetching more rows and columns that required, and
> finally limiting the result
It is doing what you told it to do which is SELECT all
process_instance_i's for user_id='317079413683604' and app_id =
'427380312000560' and then filtering further. I am going to guess that
if you run the inner query alone you will find it returns ~23496 rows.
You might have better results if you an actual join between
process_activity and process_instance. Something like below(obviously
not tested):What the OP seems to want is a semi-join:(not tested)SELECT pa.process_activity_idFROM process_activity pa WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01 00:00:00'AND EXISTS (SELECT 1 FROM process_instance pi WHERE pi.app_id = pa.app_id AND pi.user_id = '317079413683604')ORDER BYpa.process_instance_id,pa.created limit 50;I'm unsure exactly how this will impact the plan choice but it should be an improvement, and in any case more correctly defines what it is you are looking for.David J.
Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
From
"David G. Johnston"
Date:
On Thu, May 7, 2020 at 10:49 AM Amarendra Konda <amar.vijaya@gmail.com> wrote:
Can you please explain, why it is getting more columns in output, even though we have asked for only one column ?
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_response_id,
m.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_translat
e, pa.in_reply_to
Not knowing the source code in this area at all...
I'm pretty sure its because it doesn't matter. The executor retrieves data "pages", 8k blocks containing multiple records, then extracts specific full tuples from there. At that point its probably just data pointers being passed around. Its not until the end that the planner/executor has to decide which subset of columns to return to the user, or when a new tuple structure has to be created anyway (say because of joining), maybe, does it take the effort of constructing a minimally necessary output column set.
David J.
Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
From
"David G. Johnston"
Date:
On Thu, May 7, 2020 at 10:49 AM Amarendra Konda <amar.vijaya@gmail.com> wrote:
Can you please explain, why it is getting more columns in output, even though we have asked for only one column ?
Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_response_id,
m.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_translat
e, pa.in_reply_to
Not knowing the source code in this area at all...
I'm pretty sure its because it doesn't matter. The executor retrieves data "pages", 8k blocks containing multiple records, then extracts specific full tuples from there. At that point its probably just data pointers being passed around. Its not until the end that the planner/executor has to decide which subset of columns to return to the user, or when a new tuple structure has to be created anyway (say because of joining), maybe, does it take the effort of constructing a minimally necessary output column set.
David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > On Thu, May 7, 2020 at 10:49 AM Amarendra Konda <amar.vijaya@gmail.com> > wrote: >> Can you please explain, why it is getting more columns in output, even >> though we have asked for only one column ? >> * Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, >> pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, >> pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, >> pa.csat_response_id, m.process_activity_fragments, pa.created, pa.updated, >> pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, >> pa.status_fragment, pa.internal_meta, pa.interaction_id, >> pa.do_not_translate, pa.should_translate, pa.in_reply_to* > Not knowing the source code in this area at all... > I'm pretty sure its because it doesn't matter. It's actually intentional, to save a projection step within that plan node. We'll discard the extra columns once it matters, at some higher plan level. (There have been some debates on -hackers about whether this optimization is still worth anything, given all the executor improvements that have been made since it went in. But it was clearly a win at the time.) regards, tom lane
"David G. Johnston" <david.g.johnston@gmail.com> writes: > On Thu, May 7, 2020 at 10:49 AM Amarendra Konda <amar.vijaya@gmail.com> > wrote: >> Can you please explain, why it is getting more columns in output, even >> though we have asked for only one column ? >> * Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, >> pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, >> pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, >> pa.csat_response_id, m.process_activity_fragments, pa.created, pa.updated, >> pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, >> pa.status_fragment, pa.internal_meta, pa.interaction_id, >> pa.do_not_translate, pa.should_translate, pa.in_reply_to* > Not knowing the source code in this area at all... > I'm pretty sure its because it doesn't matter. It's actually intentional, to save a projection step within that plan node. We'll discard the extra columns once it matters, at some higher plan level. (There have been some debates on -hackers about whether this optimization is still worth anything, given all the executor improvements that have been made since it went in. But it was clearly a win at the time.) regards, tom lane
Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
From
"David G. Johnston"
Date:
On Thu, May 7, 2020 at 11:07 AM Amarendra Konda <amar.vijaya@gmail.com> wrote:
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS ( SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND pi.process_instance_id = pa.process_instance_id AND pi.user_id = '137074931866340') ORDER BY pa.process_instance_id, pa.created limit 50;
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1061.62 rows=1436 width=32) (actual time=0.011..20.320 rows=23506 loops=2)
> Index Cond: ((m.process_instance_id = pi.process_instance_id) AND (m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
I suppose during the nested loop the inner index scan could limit itself to the first 50 entries it finds (since the first two index columns are being held constant on each scan, m.created should define the traversal order...) so that the output of the nested loop ends up being (max 2 x 50) 100 entries which are then sorted and only the top 50 returned.
Whether the executor could but isn't doing that here or isn't programmed to do that (or my logic is totally off) I do not know.
David J.
Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
From
"David G. Johnston"
Date:
On Thu, May 7, 2020 at 11:07 AM Amarendra Konda <amar.vijaya@gmail.com> wrote:
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS ( SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND pi.process_instance_id = pa.process_instance_id AND pi.user_id = '137074931866340') ORDER BY pa.process_instance_id, pa.created limit 50;
-> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa (cost=0.70..1061.62 rows=1436 width=32) (actual time=0.011..20.320 rows=23506 loops=2)
> Index Cond: ((m.process_instance_id = pi.process_instance_id) AND (m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
I suppose during the nested loop the inner index scan could limit itself to the first 50 entries it finds (since the first two index columns are being held constant on each scan, m.created should define the traversal order...) so that the output of the nested loop ends up being (max 2 x 50) 100 entries which are then sorted and only the top 50 returned.
Whether the executor could but isn't doing that here or isn't programmed to do that (or my logic is totally off) I do not know.
David J.
Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
From
David Rowley
Date:
On Fri, 8 May 2020 at 10:00, David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Thu, May 7, 2020 at 11:07 AM Amarendra Konda <amar.vijaya@gmail.com> wrote: >> >> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activitypa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS ( SELECT 1 FROMprocess_instance pi where pi.app_id = pa.app_id AND pi.process_instance_id = pa.process_instance_id AND pi.user_id ='137074931866340') ORDER BY pa.process_instance_id, pa.created limit 50; >> >> >> -> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activitypa (cost=0.70..1061.62 rows=1436 width=32) (actual time=0.011..20.320 rows=23506 loops=2) > > > Index Cond: ((m.process_instance_id = pi.process_instance_id) AND (m.app_id = '126502930200650'::bigint) AND (m.created> '1970-01-01 00:00:00'::timestamp without time zone)) > > I suppose during the nested loop the inner index scan could limit itself to the first 50 entries it finds (since the firsttwo index columns are being held constant on each scan, m.created should define the traversal order...) so that theoutput of the nested loop ends up being (max 2 x 50) 100 entries which are then sorted and only the top 50 returned. > > Whether the executor could but isn't doing that here or isn't programmed to do that (or my logic is totally off) I do notknow. I think the planner is likely not putting the process_activity table on the outer side of the nested loop join due to the poor row estimates. If it knew that so many rows would match the join then it likely would have done that to save from having to perform the sort at all. However, because the planner has put the process_instance on the outer side of the nested loop join, it's the pathkeys from that path that the nested loop node has, which is not the same as what the ORDER BY needs, so the planner must add a sort step, which means that all rows from the nested loop plan must be read so that they can be sorted. It might be worth trying: create index on process_instance (user_id,app_id); as that might lower the cost of performing the join in the opposite order and have the planner prefer that order instead. If doing that, the OP could then ditch the fki_conv_konotor_user_user_id index to save space. If that's not enough to convince the planner that the opposite order is better then certainly SET enable_sort TO off; would. David
Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
From
David Rowley
Date:
On Fri, 8 May 2020 at 10:00, David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Thu, May 7, 2020 at 11:07 AM Amarendra Konda <amar.vijaya@gmail.com> wrote: >> >> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id FROM process_activitypa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS ( SELECT 1 FROMprocess_instance pi where pi.app_id = pa.app_id AND pi.process_instance_id = pa.process_instance_id AND pi.user_id ='137074931866340') ORDER BY pa.process_instance_id, pa.created limit 50; >> >> >> -> Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activitypa (cost=0.70..1061.62 rows=1436 width=32) (actual time=0.011..20.320 rows=23506 loops=2) > > > Index Cond: ((m.process_instance_id = pi.process_instance_id) AND (m.app_id = '126502930200650'::bigint) AND (m.created> '1970-01-01 00:00:00'::timestamp without time zone)) > > I suppose during the nested loop the inner index scan could limit itself to the first 50 entries it finds (since the firsttwo index columns are being held constant on each scan, m.created should define the traversal order...) so that theoutput of the nested loop ends up being (max 2 x 50) 100 entries which are then sorted and only the top 50 returned. > > Whether the executor could but isn't doing that here or isn't programmed to do that (or my logic is totally off) I do notknow. I think the planner is likely not putting the process_activity table on the outer side of the nested loop join due to the poor row estimates. If it knew that so many rows would match the join then it likely would have done that to save from having to perform the sort at all. However, because the planner has put the process_instance on the outer side of the nested loop join, it's the pathkeys from that path that the nested loop node has, which is not the same as what the ORDER BY needs, so the planner must add a sort step, which means that all rows from the nested loop plan must be read so that they can be sorted. It might be worth trying: create index on process_instance (user_id,app_id); as that might lower the cost of performing the join in the opposite order and have the planner prefer that order instead. If doing that, the OP could then ditch the fki_conv_konotor_user_user_id index to save space. If that's not enough to convince the planner that the opposite order is better then certainly SET enable_sort TO off; would. David