Thread: Need assistance in converting subqueries to joins
Hello Tech gents!
I am sorry if I am asking the wrong question to this group, but wanted assistance in converting a query replacing subqueries with joins.
Please find the query below (whose cost is very high):
select em_exists_id from IS_SEC_FILT WHERE (IS_SEC_FILT_GUID) NOT IN (SELECT IS_OBJ_GUID FROM TMP_IS_SEC_FILT T0, IS_PROJ P0 WHERE T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND P0.IS_PROJ_ID = IS_SEC_FILT.IS_PROJ_ID) AND (IS_PROJ_ID) IN (SELECT IS_PROJ_ID FROM IS_PROJ P0, TMP_IS_SEC_FILT T0, EM_MD R0 WHERE T0.IS_REPOSITORY_GUID = R0.REP_GUID AND T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND P0.IS_REPOSITORY_ID = R0.REP_ID);
Regards
Siraj
On 9/19/24 21:07, Siraj G wrote: > Hello Tech gents! > > I am sorry if I am asking the wrong question to this group, but wanted > assistance in converting a query replacing subqueries with joins. > > Please find the query below (whose cost is very high): Add the output of the EXPLAIN ANALYZE for the query. > > select em_exists_id from IS_SEC_FILT WHERE (IS_SEC_FILT_GUID) NOT IN > (SELECT IS_OBJ_GUID FROM TMP_IS_SEC_FILT T0, IS_PROJ P0 WHERE > T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND P0.IS_PROJ_ID = > IS_SEC_FILT.IS_PROJ_ID) AND (IS_PROJ_ID) IN (SELECT IS_PROJ_ID FROM > IS_PROJ P0, TMP_IS_SEC_FILT T0, EM_MD R0 WHERE T0.IS_REPOSITORY_GUID = > R0.REP_GUID AND T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND > P0.IS_REPOSITORY_ID = R0.REP_ID); For future reference formatting the query here: https://sqlformat.darold.net/ helps get it into a form that is easier to follow: SELECT em_exists_id FROM IS_SEC_FILT WHERE (IS_SEC_FILT_GUID) NOT IN ( SELECT IS_OBJ_GUID FROM TMP_IS_SEC_FILT T0, IS_PROJ P0 WHERE T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND P0.IS_PROJ_ID = IS_SEC_FILT.IS_PROJ_ID) AND (IS_PROJ_ID) IN ( SELECT IS_PROJ_ID FROM IS_PROJ P0, TMP_IS_SEC_FILT T0, EM_MD R0 WHERE T0.IS_REPOSITORY_GUID = R0.REP_GUID AND T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND P0.IS_REPOSITORY_ID = R0.REP_ID); > > Regards > Siraj -- Adrian Klaver adrian.klaver@aklaver.com
Hello Adrian!
Please find below the query in the format and its execution plan:
SELECT em_exists_id FROM IS_SEC_FILT WHERE (IS_SEC_FILT_GUID) NOT IN ( SELECT IS_OBJ_GUID FROM TMP_IS_SEC_FILT T0, IS_PROJ P0 WHERE T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND P0.IS_PROJ_ID = IS_SEC_FILT.IS_PROJ_ID) AND (IS_PROJ_ID) IN ( SELECT IS_PROJ_ID FROM IS_PROJ P0, TMP_IS_SEC_FILT T0, EM_MD R0 WHERE T0.IS_REPOSITORY_GUID = R0.REP_GUID AND T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND P0.IS_REPOSITORY_ID = R0.REP_ID);
Query plan:
'-> Aggregate: count(0) (cost=2284.32 rows=1988) (actual time=22602.583..22602.584 rows=1 loops=1)\n-> Remove duplicate (P0, IS_SEC_FILT) rows using temporary table (weedout) (cost=2085.53 rows=1988) (actual time=0.321..22600.652 rows=10298 loops=1)\n -> Filter: <in_optimizer>(IS_SEC_FILT.IS_SEC_FILT_GUID,<exists>(select #2) is false) (cost=2085.53 rows=1988) (actual time=0.315..22433.412 rows=514900 loops=1)\n-> Inner hash join (IS_SEC_FILT.IS_PROJ_ID = P0.IS_PROJ_ID) (cost=2085.53 rows=1988) (actual time=0.188..96.362 rows=517350 loops=1)\n-> Index scan on IS_SEC_FILT using IS_SEC_FILT_PK (cost=28.84 rows=19879) (actual time=0.019..7.386 rows=20086 loops=1)\n-> Hash\n-> Nested loop inner join (cost=8.05 rows=1) (actual time=0.064..0.132 rows=50 loops=1)\n-> Inner hash join (T0.IS_REPOSITORY_GUID = R0.REP_GUID) (cost=1.70 rows=1) (actual time=0.047..0.094 rows=50 loops=1)\n-> Filter: (T0.IS_PROJ_GUID is not null) (cost=0.38 rows=5) (actual time=0.010..0.041 rows=50 loops=1)\n-> Table scan on T0 (cost=0.38 rows=50) (actual time=0.010..0.037 rows=50 loops=1)\n-> Hash\n-> Filter: (R0.REP_ID is not null) (cost=0.45 rows=2) (actual time=0.022..0.025 rows=2 loops=1)\n-> Table scan on R0 (cost=0.45 rows=2) (actual time=0.021..0.023 rows=2 loops=1)\n-> Filter: (P0.IS_REPOSITORY_ID = R0.REP_ID) (cost=0.63 rows=1) (actual time=0.001..0.001 rows=1 loops=50)\n-> Single-row index lookup on P0 using IS_PROJ_PK (IS_PROJ_GUID=T0.IS_PROJ_GUID, IS_REPOSITORY_ID=R0.REP_ID) (cost=0.63 rows=1) (actual time=0.000..0.000 rows=1 loops=50)\n-> Select #2 (subquery in condition; dependent)\n-> Limit: 1 row(s) (cost=5.98 rows=1) (actual time=0.043..0.043 rows=0 loops=517350)\n-> Filter: <is_not_null_test>(T0.IS_OBJ_GUID) (cost=5.98 rows=1) (actual time=0.043..0.043 rows=0 loops=517350)\n-> Filter: ((<cache>(IS_SEC_FILT.IS_SEC_FILT_GUID) = T0.IS_OBJ_GUID) or (T0.IS_OBJ_GUID is null)) (cost=5.98 rows=1) (actual time=0.042..0.042 rows=0 loops=517350)\n-> Inner hash join (T0.IS_PROJ_GUID = P0.IS_PROJ_GUID) (cost=5.98 rows=1) (actual time=0.004..0.038 rows=50 loops=517350)\n -> Table scan on T0 (cost=0.35 rows=50) (actual time=0.001..0.022 rows=50 loops=517350)\n -> Hash\n -> Single-row index lookup on P0 using PRIMARY (IS_PROJ_ID=IS_SEC_FILT.IS_PROJ_ID) (cost=0.72 rows=1) (actual time=0.001..0.001 rows=1 loops=517350)\n'
On Fri, Sep 20, 2024 at 9:49 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 9/19/24 21:07, Siraj G wrote:
> Hello Tech gents!
>
> I am sorry if I am asking the wrong question to this group, but wanted
> assistance in converting a query replacing subqueries with joins.
>
> Please find the query below (whose cost is very high):
Add the output of the EXPLAIN ANALYZE for the query.
>
> select em_exists_id from IS_SEC_FILT WHERE (IS_SEC_FILT_GUID) NOT IN
> (SELECT IS_OBJ_GUID FROM TMP_IS_SEC_FILT T0, IS_PROJ P0 WHERE
> T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND P0.IS_PROJ_ID =
> IS_SEC_FILT.IS_PROJ_ID) AND (IS_PROJ_ID) IN (SELECT IS_PROJ_ID FROM
> IS_PROJ P0, TMP_IS_SEC_FILT T0, EM_MD R0 WHERE T0.IS_REPOSITORY_GUID =
> R0.REP_GUID AND T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND
> P0.IS_REPOSITORY_ID = R0.REP_ID);
For future reference formatting the query here:
https://sqlformat.darold.net/
helps get it into a form that is easier to follow:
SELECT
em_exists_id
FROM
IS_SEC_FILT
WHERE (IS_SEC_FILT_GUID)
NOT IN (
SELECT
IS_OBJ_GUID
FROM
TMP_IS_SEC_FILT T0,
IS_PROJ P0
WHERE
T0.IS_PROJ_GUID = P0.IS_PROJ_GUID
AND P0.IS_PROJ_ID = IS_SEC_FILT.IS_PROJ_ID)
AND (IS_PROJ_ID) IN (
SELECT
IS_PROJ_ID
FROM
IS_PROJ P0,
TMP_IS_SEC_FILT T0,
EM_MD R0
WHERE
T0.IS_REPOSITORY_GUID = R0.REP_GUID
AND T0.IS_PROJ_GUID = P0.IS_PROJ_GUID
AND P0.IS_REPOSITORY_ID = R0.REP_ID);
>
> Regards
> Siraj
--
Adrian Klaver
adrian.klaver@aklaver.com
Siraj G <tosiraj.g@gmail.com> writes: > Please find below the query in the format and its execution plan: [ blink... ] I'm not sure what you are using there, but it is *not* Postgres. There are assorted entries in the execution plan that community Postgres has never heard of, such as > -> Remove duplicate (P0, IS_SEC_FILT) rows using temporary table > (weedout) (cost=2085.53 rows=1988) (actual time=0.321..22600.652 > rows=10298 loops=1) > -> Single-row index lookup on P0 using IS_PROJ_PK > (IS_PROJ_GUID=T0.IS_PROJ_GUID, IS_REPOSITORY_ID=R0.REP_ID) (cost=0.63 > rows=1) (actual time=0.000..0.000 rows=1 loops=50) Maybe this is RDS, or Aurora, or Greenplum, or one of many other commercial forks of Postgres? In any case you'd get more on-point advice from their support forums than from the PG community. It looks like this is a fork that has installed its own underlying table engine, meaning that what we know about performance may not be terribly relevant. regards, tom lane