Thread: SOLVED - RE: Poor performance using CTE

SOLVED - RE: Poor performance using CTE

From
David Greco
Date:
-----Original Message-----
From: Andrew Dunstan [mailto:andrew@dunslane.net]
Sent: Wednesday, November 14, 2012 11:08 AM
To: David Greco
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Poor performance using CTE


On 11/14/2012 10:56 AM, David Greco wrote:
> You're right. I was translating an oracle query , but looks like PG will allow some syntax that is different. Trying
tofind entries in fedexinvoices where smp_pkg.get_invoice_charges(id) returns a record containing charge_name in
('ADDRESSCORRECTION CHARGE','ADDRESS CORRECTION'). Should return the fedexinvoices row and the row from
smp_pkg.get_invoice_chargesthat contains the address correction. 
>
>
> Something like this, though this is syntactically incorrect as smp_pkg.get_invoice_charges returns a set:
>
>
> select fedexinvoices.*, (smp_pkg.get_invoice_charges(id)).*
> from fedexinvoices
> WHERE
> trim(fedexinvoices.trackno)='799159791643'
> and
> (smp_pkg.get_invoice_charges(id)).charge_name IN ('ADDRESS CORRECTION
> CHARGE','ADDRESS CORRECTION')


First, please don't top-post when someone has replied underneath your post. It makes the thread totally unreadable. See
<http://idallen.com/topposting.html>

You could do something like this:

WITH invoices as
(
    select *
    from fedexinvoices
    where trim(fedexinvoices.trackno)='799159791643'
),

charges as
(
    SELECT fi2.id, smp_pkg.get_invoice_charges(fi2.id) charge_info
    from fedexinvoices fi2 join invoices i on i.id = f12.id
)

select invoices.*
from invoices
inner join charges on charges.id = invoices.id
     AND (charges.charge_info).charge_name IN ('ADDRESS CORRECTION CHARGE','ADDRESS CORRECTION')

;


Or probably something way simpler but I just did this fairly quickly and mechanically


cheers

andrew







Thanks, that did the trick. Though I'm still not clear as to why.






Re: SOLVED - RE: Poor performance using CTE

From
Tom Lane
Date:
David Greco <David_Greco@harte-hanks.com> writes:
> Thanks, that did the trick. Though I'm still not clear as to why.

PG treats WITH as an optimization fence --- the WITH query will be
executed pretty much as-is.  It may be that Oracle flattens the query
somehow; though if you're using black-box functions in both cases,
it's not obvious where the optimizer could get any purchase that way.

            regards, tom lane


Re: SOLVED - RE: Poor performance using CTE

From
Craig Ringer
Date:
On 11/15/2012 12:29 AM, Tom Lane wrote:
> David Greco <David_Greco@harte-hanks.com> writes:
>> Thanks, that did the trick. Though I'm still not clear as to why.
> PG treats WITH as an optimization fence --- the WITH query will be
> executed pretty much as-is.  It may be that Oracle flattens the query
> somehow; though if you're using black-box functions in both cases,
> it's not obvious where the optimizer could get any purchase that way.
>

I was looking through the latest spec drafts I have access to and
couldn't find any reference to Pg's optimisation-fence-for-CTEs
behaviour being required by the standard, though I've repeatedly seen it
said that there is such a requirement.

Do you know where it's specified?

All I can see is that the optimised result must have the same effect as
the original. That'd mean that wCTEs and CTE terms that use VOLATILE
functions or functions with side-effects couldn't be optimised into
other queries. Simple CTEs could be, though, and there are times I've
really wished I could use a CTE but I've had to use a set-returning
subquery to get reasonable plans.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: SOLVED - RE: Poor performance using CTE

From
Tom Lane
Date:
Craig Ringer <craig@2ndQuadrant.com> writes:
> I was looking through the latest spec drafts I have access to and
> couldn't find any reference to Pg's optimisation-fence-for-CTEs
> behaviour being required by the standard, though I've repeatedly seen it
> said that there is such a requirement.

I don't believe it's required by the standard (it's hard to see how it
could be, when query optimization is a topic outside the spec to start
with).  However, we allow INSERT/UPDATE/DELETE RETURNING inside WITH,
and for those I think you really need to treat WITH as an optimization
fence.  It's a lot more debatable for SELECT; there are some advantages
to providing a fence this way but there are definitely downsides too.
I could see adjusting that definition in the future, as we get more
experience with use of CTEs.

            regards, tom lane


Re: SOLVED - RE: Poor performance using CTE

From
Andrew Dunstan
Date:
On 11/14/2012 08:17 PM, Craig Ringer wrote:
> On 11/15/2012 12:29 AM, Tom Lane wrote:
>> David Greco <David_Greco@harte-hanks.com> writes:
>>> Thanks, that did the trick. Though I'm still not clear as to why.
>> PG treats WITH as an optimization fence --- the WITH query will be
>> executed pretty much as-is.  It may be that Oracle flattens the query
>> somehow; though if you're using black-box functions in both cases,
>> it's not obvious where the optimizer could get any purchase that way.
>>
> I was looking through the latest spec drafts I have access to and
> couldn't find any reference to Pg's optimisation-fence-for-CTEs
> behaviour being required by the standard, though I've repeatedly seen it
> said that there is such a requirement.
>
> Do you know where it's specified?
>
> All I can see is that the optimised result must have the same effect as
> the original. That'd mean that wCTEs and CTE terms that use VOLATILE
> functions or functions with side-effects couldn't be optimised into
> other queries. Simple CTEs could be, though, and there are times I've
> really wished I could use a CTE but I've had to use a set-returning
> subquery to get reasonable plans.


It cuts both ways. I have used CTEs a LOT precisely because this
behaviour lets me get better plans. Without that I'll be back to using
the "offset 0" hack.

cheers

andrew


Re: SOLVED - RE: Poor performance using CTE

From
Peter Geoghegan
Date:
On 15 November 2012 01:46, Andrew Dunstan <andrew@dunslane.net> wrote:
> It cuts both ways. I have used CTEs a LOT precisely because this behaviour
> lets me get better plans. Without that I'll be back to using the "offset 0"
> hack.

Is the "OFFSET 0" hack really so bad? We've been telling people to do
that for years, so it's already something that we've effectively
committed to.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: SOLVED - RE: Poor performance using CTE

From
Gavin Flower
Date:
On 15/11/12 15:03, Peter Geoghegan wrote:
> On 15 November 2012 01:46, Andrew Dunstan <andrew@dunslane.net> wrote:
>> It cuts both ways. I have used CTEs a LOT precisely because this behaviour
>> lets me get better plans. Without that I'll be back to using the "offset 0"
>> hack.
> Is the "OFFSET 0" hack really so bad? We've been telling people to do
> that for years, so it's already something that we've effectively
> committed to.
>
How about adding the keywords FENCED and NOT FENCED to the SQL
definition of CTE's - with FENCED being the default?


Cheers,
Gavin



Re: SOLVED - RE: Poor performance using CTE

From
Merlin Moncure
Date:
On Wed, Nov 14, 2012 at 8:03 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> On 15 November 2012 01:46, Andrew Dunstan <andrew@dunslane.net> wrote:
>> It cuts both ways. I have used CTEs a LOT precisely because this behaviour
>> lets me get better plans. Without that I'll be back to using the "offset 0"
>> hack.
>
> Is the "OFFSET 0" hack really so bad? We've been telling people to do
> that for years, so it's already something that we've effectively
> committed to.

IMSNHO, 'OFFSET 0' is completely unreadable black magic.  I agree with
Andrew: CTEs allow for manual composition of queries and can be the
best tool when the planner is outsmarting itself.  In the old days,
we'd extract data to a temp table and join against that: CTE are
essentially a formalization of that technique.  I like things the way
they are; if CTE are hurting your plan, that's an indication you're
using them inappropriately.

merlin


Re: SOLVED - RE: Poor performance using CTE

From
Claudio Freire
Date:
On Tue, Nov 20, 2012 at 4:22 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Wed, Nov 14, 2012 at 8:03 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
>> On 15 November 2012 01:46, Andrew Dunstan <andrew@dunslane.net> wrote:
>>> It cuts both ways. I have used CTEs a LOT precisely because this behaviour
>>> lets me get better plans. Without that I'll be back to using the "offset 0"
>>> hack.
>>
>> Is the "OFFSET 0" hack really so bad? We've been telling people to do
>> that for years, so it's already something that we've effectively
>> committed to.
>
> IMSNHO, 'OFFSET 0' is completely unreadable black magic.  I agree with
> Andrew: CTEs allow for manual composition of queries and can be the
> best tool when the planner is outsmarting itself.  In the old days,
> we'd extract data to a temp table and join against that: CTE are
> essentially a formalization of that technique.  I like things the way
> they are; if CTE are hurting your plan, that's an indication you're
> using them inappropriately.

I agree, **BUT**, I cannot imagine how pushing constraints to the CTE
(under adequate conditions) could be anything but beneficial.

It *could* just be a lack of imagination on my part. But if it were
not, then it'd be nice for it to be done automatically (since this
particular CTE behavior bites enough people already).


Re: SOLVED - RE: Poor performance using CTE

From
Jon Nelson
Date:
My perspective on this is that CTEs *should* be just like creating a
temporary table and then joining to it, but without the
materialization costs. In that respect, they seem like they should be
like nifty VIEWs. If I wanted the behavior of materialization and then
join, I'd do that explicitly with temporary tables, but using CTEs as
an explicit optimization barrier feels like the explaining away
surprising behavior.

As can be seen by the current conversation, not everyone is convinced
that CTEs ought to be an explicit optimization barrier, and setting
that behavior as somehow desirable or explicit (rather than merely an
implementation detail) feels shortsighted to me. I would be delighted
to find that in some future version of PostgreSQL, but  if that is not
to be, at the very least, the verbiage surrounding CTEs might want to
include (perhaps prominently) something along the lines of "CTEs are
currently an optimization barrier, but this is an implementation
detail and may change in future versions".  Perhaps even including a
small blurb about what an optimization barrier even means (my
understanding is that it merely forces materialization of that part of
the query).

That's just my perspective, coming at the use of CTEs not as a
PostgreSQL developer, but as somebody who learned about CTEs and
started using them - only to discover surprising behavior.

On Tue, Nov 20, 2012 at 1:22 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Wed, Nov 14, 2012 at 8:03 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
>> On 15 November 2012 01:46, Andrew Dunstan <andrew@dunslane.net> wrote:
>>> It cuts both ways. I have used CTEs a LOT precisely because this behaviour
>>> lets me get better plans. Without that I'll be back to using the "offset 0"
>>> hack.
>>
>> Is the "OFFSET 0" hack really so bad? We've been telling people to do
>> that for years, so it's already something that we've effectively
>> committed to.
>
> IMSNHO, 'OFFSET 0' is completely unreadable black magic.  I agree with
> Andrew: CTEs allow for manual composition of queries and can be the
> best tool when the planner is outsmarting itself.  In the old days,
> we'd extract data to a temp table and join against that: CTE are
> essentially a formalization of that technique.  I like things the way
> they are; if CTE are hurting your plan, that's an indication you're
> using them inappropriately.
>
> merlin
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance



--
Jon


Re: SOLVED - RE: Poor performance using CTE

From
Merlin Moncure
Date:
On Tue, Nov 20, 2012 at 1:53 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote:
> As can be seen by the current conversation, not everyone is convinced
that CTEs ought to be an explicit optimization barrier

On Tue, Nov 20, 2012 at 1:26 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> It *could* just be a lack of imagination on my part. But if it were
> not, then it'd be nice for it to be done automatically (since this
> particular CTE behavior bites enough people already).

Sure.  I just find it personally hard to find a good demarcation line
between A: "queries where pushing quals through are universally
beneficial and wanted" and B: "queries where we are inserting an
explicit materialization step to avoid planner issues", particularly
where there is substantial overlap with between A and C: "queries that
are written with a CTE and arguably shouldn't be".

Put another way, I find CTE to express: 'this then that' where joins
express 'this with that'.  So current behavior is not surprising at
all. All that said, there could be a narrow class of low hanging cases
(such as the OP's) that could be sniped...I'm just skeptical.

merlin


Re: SOLVED - RE: Poor performance using CTE

From
Claudio Freire
Date:
On Tue, Nov 20, 2012 at 5:24 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Tue, Nov 20, 2012 at 1:53 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote:
>> As can be seen by the current conversation, not everyone is convinced
> that CTEs ought to be an explicit optimization barrier
>
> On Tue, Nov 20, 2012 at 1:26 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> It *could* just be a lack of imagination on my part. But if it were
>> not, then it'd be nice for it to be done automatically (since this
>> particular CTE behavior bites enough people already).
>
> Sure.  I just find it personally hard to find a good demarcation line
> between A: "queries where pushing quals through are universally
> beneficial and wanted" and B: "queries where we are inserting an
> explicit materialization step to avoid planner issues", particularly
> where there is substantial overlap with between A and C: "queries that
> are written with a CTE and arguably shouldn't be".
>
> Put another way, I find CTE to express: 'this then that' where joins
> express 'this with that'.  So current behavior is not surprising at
> all. All that said, there could be a narrow class of low hanging cases
> (such as the OP's) that could be sniped...I'm just skeptical.

It could work very well towards CTE-including views, where the quals
cannot be added in the view but would be present when the view is
expanded in final queries.


Re: SOLVED - RE: Poor performance using CTE

From
Tom Lane
Date:
Jon Nelson <jnelson+pgsql@jamponi.net> writes:
> ... Perhaps even including a
> small blurb about what an optimization barrier even means (my
> understanding is that it merely forces materialization of that part of
> the query).

FWIW, it has nothing to do with materialization; it means that we don't
push conditions down into that subquery, nor pull subexpressions up out
of it, nor rearrange join order across the subquery boundary.  In short
the subquery is planned separately from the outer query.  But it could
then be run by the executor in the usual tuple-at-a-time fashion,
without materializing the whole subquery result.

It is true that CTEScan nodes materialize the subquery output (ie copy
it into a tuplestore), but that's to support multiple CTEScans reading
the same CTE.  One of the optimizations we *should* put in place
sometime is skipping the tuplestore if there's only one CTEScan on the
CTE.

            regards, tom lane


Re: SOLVED - RE: Poor performance using CTE

From
Craig Ringer
Date:
On 11/21/2012 03:53 AM, Jon Nelson wrote:
> My perspective on this is that CTEs *should* be just like creating a
> temporary table and then joining to it, but without the
> materialization costs. In that respect, they seem like they should be
> like nifty VIEWs. If I wanted the behavior of materialization and then
> join, I'd do that explicitly with temporary tables, but using CTEs as
> an explicit optimization barrier feels like the explaining away
> surprising behavior.
I agree, especially since that barrier isn't specified as standard, so
we're using a standard feature with a subtle quirk as a
database-specific optimisation trick. A hint, as it were, like OFFSET 0.

*(Dons asbestos underwear an dives for cover)*

My big problem with the status quo is that it breaks queries from other
databases, like MS SQL server, where CTEs are optimised. I see this
periodically on Stack Overflow, with people asking variants of "Why
does  PostgreSQL take 10,000 times longer to execute this query"? (not a
literal quote).

I really want to see this formalized and made explicit with `WITH
tablename AS MATERIALIZE (SELECT)` or similar.

Right now I often can't use CTEs to clean up hard-to-read queries
because of the optimisation barrier, so I have to create a temporary
view, temporary table, or use nested subqueries in FROM instead. Ugly.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: SOLVED - RE: Poor performance using CTE

From
Vitalii Tymchyshyn
Date:

I'd also add ANALYZED/NOT ANALYZED. This should force it behave like 'create table, analyze, select' with statistics used in second query plan.

P.S. defaults can be configurable.

20 лист. 2012 02:22, "Gavin Flower" <GavinFlower@archidevsys.co.nz> напис.
On 15/11/12 15:03, Peter Geoghegan wrote:
On 15 November 2012 01:46, Andrew Dunstan <andrew@dunslane.net> wrote:
It cuts both ways. I have used CTEs a LOT precisely because this behaviour
lets me get better plans. Without that I'll be back to using the "offset 0"
hack.
Is the "OFFSET 0" hack really so bad? We've been telling people to do
that for years, so it's already something that we've effectively
committed to.

How about adding the keywords FENCED and NOT FENCED to the SQL definition of CTE's - with FENCED being the default?


Cheers,
Gavin



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: SOLVED - RE: Poor performance using CTE

From
Bruce Momjian
Date:
On Tue, Nov 20, 2012 at 02:24:01PM -0600, Merlin Moncure wrote:
> On Tue, Nov 20, 2012 at 1:53 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote:
> > As can be seen by the current conversation, not everyone is convinced
> that CTEs ought to be an explicit optimization barrier
>
> On Tue, Nov 20, 2012 at 1:26 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> > It *could* just be a lack of imagination on my part. But if it were
> > not, then it'd be nice for it to be done automatically (since this
> > particular CTE behavior bites enough people already).
>
> Sure.  I just find it personally hard to find a good demarcation line
> between A: "queries where pushing quals through are universally
> beneficial and wanted" and B: "queries where we are inserting an
> explicit materialization step to avoid planner issues", particularly
> where there is substantial overlap with between A and C: "queries that
> are written with a CTE and arguably shouldn't be".
>
> Put another way, I find CTE to express: 'this then that' where joins
> express 'this with that'.  So current behavior is not surprising at
> all. All that said, there could be a narrow class of low hanging cases
> (such as the OP's) that could be sniped...I'm just skeptical.

Is thi
--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +