On 5/7/20 4:19 AM, Amarendra Konda wrote: > Hi, > > PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled > by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit > > We have noticed huge difference interms of execution plan ( response > time) , When we pass the direct values Vs inner query to IN clause. > > High level details of the use case are as follows > > * As part of the SQL there are 2 tables named Process_instance > (master) and Process_activity ( child) > * Wanted to fetch TOP 50 rows from Process_activity table for the > given values of the Process_instance. > * When we used Inner Join / Inner query ( query1) between parent > table and child table , LIMIT is not really taking in to account. > Instead it is fetching more rows and columns that required, and > finally limiting the result
It is doing what you told it to do which is SELECT all process_instance_i's for user_id='317079413683604' and app_id = '427380312000560' and then filtering further. I am going to guess that if you run the inner query alone you will find it returns ~23496 rows. You might have better results if you an actual join between process_activity and process_instance. Something like below(obviously not tested):
What the OP seems to want is a semi-join:
(not tested)
SELECT pa.process_activity_id
FROM process_activity pa WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01 00:00:00'
AND EXISTS (
SELECT 1 FROM process_instance pi WHERE pi.app_id = pa.app_id AND pi.user_id = '317079413683604'
)
ORDER BY
pa.process_instance_id,
pa.created limit 50;
I'm unsure exactly how this will impact the plan choice but it should be an improvement, and in any case more correctly defines what it is you are looking for.