Home > mailing lists

Re: Execution time from >1s -> 80m+ when extra columns added in SELECT for sub-query - Mailing list pgsql-performance

From	Tom Lane
Subject	Re: Execution time from >1s -> 80m+ when extra columns added in SELECT for sub-query
Date	May 19, 2020 00:58:48
Msg-id	19695.1589839128@sss.pgh.pa.us Whole thread Raw
In response to	Re: Execution time from >1s -> 80m+ when extra columns added inSELECT for sub-query (A Guy Named Ryan <aguynamedryan@gmail.com>)
List	pgsql-performance

Tree view

A Guy Named Ryan <aguynamedryan@gmail.com> writes:
> Why would the planner switch plans so drastically given that all I'm
> doing is including a few extra columns in the subselect, particularly
> when those columns are discarded by the super? parent? subselect

The problem is that the columns you're adding *don't belong to that
table*.  Per your schema dump,
jtemp1c37l3b_baseline_windows_with_collections only contains the columns
person_id and uuid.  So when you write

                                SELECT
                                    "uuid" AS "uuid",
                                    CAST(NULL AS float) AS "drug_amount",
                                    CAST(NULL AS text) AS
"drug_amount_units",
                                    CAST(NULL AS bigint) AS
"drug_days_supply",
                                    CAST(NULL AS text) AS "drug_name",
                                    CAST(NULL AS float) AS "drug_quantity",
                                    CAST(NULL AS integer) AS "window_id",
                                    "person_id" AS "person_id",
                                    "criterion_id" AS "criterion_id",
                                    "criterion_table" AS "criterion_table",
                                    "criterion_domain" AS
"criterion_domain",
                                    "start_date" AS "start_date",
                                    "end_date" AS "end_date"
                                FROM
"jigsaw_temp"."jtemp1c37l3b_baseline_windows_with_collections"

those are the only two columns that are "legitimately" part of that bottom
sub-select, and the others are outer references to
jtemp1c37l3b_baseline_windows_after_inclusion.  That's legal per SQL,
but it makes the EXISTS into a correlated sub-select, which is something
we can't turn into a semijoin.

Indeed, the unreferenced columns do get thrown away later, but that
doesn't happen until well past the point where the join restructuring
decisions are made (and there are good reasons for that ordering of
operations).

Basically I'd write this off as "broken SQL code generator".  If it
doesn't understand the difference between a local reference and an
outer reference, you shouldn't be letting it near your database.
That sort of fundamental misunderstanding often leads to incorrect
query results, never mind whether the query is fast or not.

            regards, tom lane

pgsql-performance by date:

From: A Guy Named Ryan
Date: 18 May 2020, 23:49:39
Subject: Re: Execution time from >1s -> 80m+ when extra columns added inSELECT for sub-query

From: Piotr Włodarczyk
Date: 20 May 2020, 10:30:52
Subject: OOM Killer kills PostgreSQL

Re: Execution time from >1s -> 80m+ when extra columns added in SELECT for sub-query - Mailing list pgsql-performance

Previous

Next