Re: Execution time from >1s -> 80m+ when extra columns added inSELECT for sub-query - Mailing list pgsql-performance

From A Guy Named Ryan
Subject Re: Execution time from >1s -> 80m+ when extra columns added inSELECT for sub-query
Date
Msg-id CAAJz3y2QnyCAJOoSx0P5U9th0cAH1rtQ5nqC_UwqfDZGW9mdKg@mail.gmail.com
Whole thread Raw
In response to Re: Execution time from >1s -> 80m+ when extra columns added inSELECT for sub-query  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: Execution time from >1s -> 80m+ when extra columns added in SELECT for sub-query  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Thanks for responding!


On May 18, 2020 at 12:18:37 PM, Pavel Stehule
(pavel.stehule@gmail.com(mailto:pavel.stehule@gmail.com)) wrote:

> Hi
>
> It looks so in slow plan is some strange relations between subselects - the slow plan looks like plan for correlated
subquery,and it should be slow.
 

I'm not very saavy about query planning and have very little (if any)
idea of what I'm talking about.


Our dynamically generated SQL does generate a lot of "strange"
subselects. They're hard to avoid for what we're doing. I admit it
makes for some abnormal looking queries. That said, I think we've
revealed an interesting issue here.


Why would the planner switch plans so drastically given that all I'm
doing is including a few extra columns in the subselect, particularly
when those columns are discarded by the super? parent? subselect


I'd expect the query planner to know that only the uuid column of the
subselect is used and not bother to actually project any additional
columns. Also, my impression about the EXISTS operator is that it
would not really need to pull any column values from disk except where
those values would be used to determine the conditions for the EXISTS.


Maybe I'm misunderstanding how this query should be executed, but I
wanted to bring it to your attention because it seems like, though the
SQL is a bit crazy, it reveals some very inconsistent planning on
PostgreSQL's part and might be something to look into.

> Minimally you miss a index on column
> jtemp1c37l3b_baseline_windows_after_inclusion.uuid

Thanks for pointing this out!


This is an intermediate table we generate as part of a much larger
process and the table is only used once. I'm under the impression that
there's a trade-off between taking the time to first build an index
then run the query rather than just running the query that one time.

It's an interesting idea to build the indexes just to avoid poor query
plans and it's something I'll keep in mind if we run into other
queries that trigger poor performance and we're unable to work around
them some other way.


If you'd like, I can slap on index on those tables and re-run a few
queries, but again, even if performance improves in this one test
case, I'm not sure it'd convince us to start adding indexes to some or
all our tables as part of our process just to avoid this one bad plan
in this one query. We've run 15-20 of our larger processes and only
hit this situation in one query in one process. I imagine adding
indexes across the board might be a heavy-handed solution to work
around this issue.



pgsql-performance by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: Execution time from >1s -> 80m+ when extra columns added inSELECT for sub-query
Next
From: Tom Lane
Date:
Subject: Re: Execution time from >1s -> 80m+ when extra columns added in SELECT for sub-query