Home > mailing lists

Thread: [HACKERS] CTE inlining

[HACKERS] CTE inlining

From

Ilya Shkuratov

Date:

29 April 2017, 20:45:00

Hello, dear hackers!

There is task in todo list about optional CTE optimization fence
disabling.

I am not interested at this point in disabling mechanism
implementation, but I would like to discuss the optimization
mechanism, that should work when the fence is disabled.

It seems, that we can replace CTE with subquery, so the optimizer
can do all available optimizations. This idea is quite
straightforward, but I could not find a discussion of it.
(Maybe it is so, because everyone knows that the idea is bad and it is
not worth to discuss. But I hope it is not, so I start this thread. =))

First of all, to such replacement to be valid, the CTE must be 1. non-writable (e.g. be of form: SELECT ...), 2.
donot use VOLATILE or STABLE functions, 3. ... (maybe there must be more restrictions?)

Also, before inlining, we should check that some optimization
can be applied, using functions from
'pull_up_subqueries_recurse' and 'subquery_push_qual'.

If it is true, and there only one reference to CTE,
we can inline it immediately.

What it is not clear is how we should estimate whether it is worth
to inline, when there is multiple references. Here are my preliminary
ideas.

Let consider "pull up subquery" and "push down qualifiers" cases
separately.

For "push down qualifiers", if `subquery_push_qual` is `true`,
we can do the following: 1. copy CTE subquery, 2. push down quals, 3. find paths, 3. inline if cost of
(CTEscan) > (cheapest_path(subquery) + subquery scan)

Probably, this approach is not feasible, because it involves subquery
replaning, and we should consider a more "lightweight" heuristic.

For "pull up subquery" similar approach may lead to duplicate planning
of the whole query, that almost sure is too expensive.
So I wonder, is it possible to estimate a join predicate selectivity
against CTE subquery result and inline it if selectivity is "high" enough?
(If it is possible the same can be applied to the first case.)

I would be glad to hear feedback on described approach.

Ilya Shkuratov

Re: [HACKERS] CTE inlining

From

Craig Ringer

Date:

30 April 2017, 06:51:18

On 30 Apr. 2017 07:56, "Ilya Shkuratov" <motr.ilya@ya.ru> wrote:

Hello, dear hackers!

There is task in todo list about optional CTE optimization fence
disabling.

I am not interested at this point in disabling mechanism
implementation, but I would like to discuss the optimization
mechanism, that should work when the fence is disabled.

It's looking at what other DBMSes do.

Notably MS SQL Server. AFAIK its CTEs are a lot like query-scoped views. They are simply updatable where possible, so you can write

WITH x AS (...)

UPDATE x SET ...

I do not know how MS SQL handles inlining and pullup/pushdown vs materialization, handles multiple evaluation costs, etc.

This is the model I would want to aim for.

It seems, that we can replace CTE with subquery, so the optimizer
can do all available optimizations. This idea is quite
straightforward, but I could not find a discussion of it.
(Maybe it is so, because everyone knows that the idea is bad and it is
not worth to discuss. But I hope it is not, so I start this thread. =))

It's not bad for SELECT.

But there are complexities.

- CTE terms may contain data-mutating functions people are relying on not multiply executing;

- we document that in postgres CTEs act as optimisation fences even with the standard syntax. So users rely on this as a query hint. Personally I want to relnotes this and tell people to use our OFFSET 0 hint instead, or add a NOINLINE option to our CTEs, then make pg allow inlining by default. This is a BC break, but not a big one if we restrict inlining of volatile. And since we don't have query hints (*cough*) by project policy, we can't really object to removing one can we?

- as you noted, it is hard to decide when it's worth inlining vs materializing for CTE terms referenced more than once. We should possibly start small and only inline single reference terms in the first release. We'd continue to force materializing of multiple reference terms.

That'd at least help people who use CTEs to write clearer queries not suffer for it. And it'd give us experience to help with conservatively introducing multiple reference inlining.

Re: [HACKERS] CTE inlining

From

Tom Lane

Date:

30 April 2017, 07:28:46

Craig Ringer <craig.ringer@2ndquadrant.com> writes:
> - as you noted, it is hard to decide when it's worth inlining vs
> materializing for CTE terms referenced more than once.

[ raised eyebrow... ]  Please explain why the answer isn't trivially
"never".

There's already a pretty large hill to climb here in the way of
breaking peoples' expectations about CTEs being optimization
fences.  Breaking the documented semantics about CTEs being
single-evaluation seems to me to be an absolute non-starter.
        regards, tom lane

Re: [HACKERS] CTE inlining

From

Pavel Stehule

Date:

30 April 2017, 08:19:21

2017-04-30 6:28 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:

Craig Ringer <craig.ringer@2ndquadrant.com> writes:
> - as you noted, it is hard to decide when it's worth inlining vs
> materializing for CTE terms referenced more than once.

[ raised eyebrow... ] Please explain why the answer isn't trivially
"never".

There's already a pretty large hill to climb here in the way of
breaking peoples' expectations about CTEs being optimization
fences. Breaking the documented semantics about CTEs being
single-evaluation seems to me to be an absolute non-starter.

why we cannot to introduce GUC option - enable_cteoptfence ?

Regards

Pavel

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE inlining

From

Andres Freund

Date:

30 April 2017, 08:28:16

On 2017-04-30 00:28:46 -0400, Tom Lane wrote:
> There's already a pretty large hill to climb here in the way of
> breaking peoples' expectations about CTEs being optimization
> fences.  Breaking the documented semantics about CTEs being
> single-evaluation seems to me to be an absolute non-starter.

If all referenced functions are non-volatile, I don't quite see the
problem?  Personally I believe we'll have to offer a proper
anti-inlining workaround anyway, and in that case there's really nothing
that should stop us from inlining CTE without volatile functions twice?

- Andres

Re: [HACKERS] CTE inlining

From

Craig Ringer

Date:

30 April 2017, 08:58:14

On 30 Apr. 2017 13:28, "Andres Freund" <andres@anarazel.de> wrote:

On 2017-04-30 00:28:46 -0400, Tom Lane wrote:
> There's already a pretty large hill to climb here in the way of
> breaking peoples' expectations about CTEs being optimization
> fences. Breaking the documented semantics about CTEs being
> single-evaluation seems to me to be an absolute non-starter.

If all referenced functions are non-volatile, I don't quite see the
problem? Personally I believe we'll have to offer a proper
anti-inlining workaround anyway, and in that case there's really nothing
that should stop us from inlining CTE without volatile functions twice?

Exactly.

The initial implementation had limitations. So they got documented as features, not bugs or possible future enhancements. Yay? So we're stuck with it forever?

I agree we shouldn't break working, correct queries such that they return different results. But if someone is lying about volatility they don't get the expectation of correctness. And we have a policy against hints, so surely we should be keen to remove this hack that serves as a hint - right?

We have OFFSET 0 for anyone really depending on it, and at least when you read that you know to go "wtf" and look at the manual, wheras the CTE fence behaviour is invisible and silent.

Yes, experienced and established postgres users expect the optimisation fence behaviour. They abuse it as a query hint or work around it with subqueries in FROM. They also know OFFSET 0 ... and ideally should even read the relnotes. Users from other DMBSes looking to migrate, and new users, are regularly surprised by our CTEs. I see it a lot on Stack Overflow and other places outside our comfortable walls.

Personally I find it very annoying when I'd like to use CTEs to structure queries more readably, but land up having to use subqueries in FROM instead.

Like the work Andes has been doing on our bizarre handing of SRFs in the SELECT target list I really think it's just something that needs to be done.

Re: [HACKERS] CTE inlining

From

Andres Freund

Date:

30 April 2017, 10:46:33

Hi,

On 2017-04-30 13:58:14 +0800, Craig Ringer wrote:
> We have OFFSET 0 for anyone really depending on it, and at least when you
> read that you know to go "wtf" and look at the manual, wheras the CTE fence
> behaviour is invisible and silent.

I don't think that's a good idea.  What if you need to prevent inlining
of something that actually needs an offset? What if the behaviour of
offset is ever supposed to change?  Relying more on that seems to just
be repeating the mistake around CTEs.

> Like the work Andes has been doing on our bizarre handing of SRFs in the
> SELECT target list I really think it's just something that needs to be
> done.

With help from Tom, luckily...

Greetings,

Andres Freund

Re: [HACKERS] CTE inlining

From

Ilya Shkuratov

Date:

30 April 2017, 11:34:48

30.04.2017, 08:58, "Craig Ringer" <craig.ringer@2ndquadrant.com>:

On 30 Apr. 2017 13:28, "Andres Freund" <andres@anarazel.de> wrote:
On 2017-04-30 00:28:46 -0400, Tom Lane wrote:
> There's already a pretty large hill to climb here in the way of
> breaking peoples' expectations about CTEs being optimization
> fences. Breaking the documented semantics about CTEs being
> single-evaluation seems to me to be an absolute non-starter.

If all referenced functions are non-volatile, I don't quite see the
problem? Personally I believe we'll have to offer a proper
anti-inlining workaround anyway, and in that case there's really nothing
that should stop us from inlining CTE without volatile functions twice?

Exactly.

The initial implementation had limitations. So they got documented as features, not bugs or possible future enhancements. Yay? So we're stuck with it forever?

I agree we shouldn't break working, correct queries such that they return different results. But if someone is lying about volatility they don't get the expectation of correctness. And we have a policy against hints, so surely we should be keen to remove this hack that serves as a hint - right?

We have OFFSET 0 for anyone really depending on it, and at least when you read that you know to go "wtf" and look at the manual, wheras the CTE fence behaviour is invisible and silent.

Yes, experienced and established postgres users expect the optimisation fence behaviour. They abuse it as a query hint or work around it with subqueries in FROM. They also know OFFSET 0 ... and ideally should even read the relnotes. Users from other DMBSes looking to migrate, and new users, are regularly surprised by our CTEs. I see it a lot on Stack Overflow and other places outside our comfortable walls.

Personally I find it very annoying when I'd like to use CTEs to structure queries more readably, but land up having to use subqueries in FROM instead.

Like the work Andes has been doing on our bizarre handing of SRFs in the SELECT target list I really think it's just something that needs to be done.

Also, I would like to remind that the disabling optimization fence is suggested to be OPTIONAL.

So we don't break peoples' expectations, nor documented semantics.

Re: [HACKERS] CTE inlining

From

Tomas Vondra

Date:

01 May 2017, 00:54:48

On 04/30/2017 06:28 AM, Tom Lane wrote:
> Craig Ringer <craig.ringer@2ndquadrant.com> writes:
>> - as you noted, it is hard to decide when it's worth inlining vs
>> materializing for CTE terms referenced more than once.
>
> [ raised eyebrow... ]  Please explain why the answer isn't trivially
> "never".
>
> There's already a pretty large hill to climb here in the way of
> breaking peoples' expectations about CTEs being optimization
> fences.  Breaking the documented semantics about CTEs being
> single-evaluation seems to me to be an absolute non-starter.
>

I'm not sure that's a universal expectation, though. I know there are 
people who actually do rely on that intentionally, no doubt about that. 
And we'd nee to make it work for them.

But I keep running into people who face serious performance issues 
exactly because not realizing this, and using CTEs as named subqueries. 
And when I tell them "optimization fence" they react "Whaaaaaaat?"

If I had to make up some numbers, I'd say the "Whaaaaat?" group is about 
10x the group of people who intentionally rely on CTEs being 
optimization fences.

FWIW I don't know how to do this. There were multiple attempts at this 
in the past, none of them succeeded. But perhaps we could at least 
propagate some of the CTE features, so that the outside query can 
benefit from that (e.g. when the CTE is sorted, we could set the 
sortkeys). That wouldn't break the fence thing, but it would allow other 
stuff.

regard

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CTE inlining

From

Tomas Vondra

Date:

01 May 2017, 01:00:18

On 04/30/2017 09:46 AM, Andres Freund wrote:
> Hi,
>
> On 2017-04-30 13:58:14 +0800, Craig Ringer wrote:
>> We have OFFSET 0 for anyone really depending on it, and at least when you
>> read that you know to go "wtf" and look at the manual, wheras the CTE fence
>> behaviour is invisible and silent.
>
> I don't think that's a good idea.  What if you need to prevent inlining
> of something that actually needs an offset? What if the behaviour of
> offset is ever supposed to change?  Relying more on that seems to just
> be repeating the mistake around CTEs.
>

I agree with this. But OFFSET 0 would force people to modify the queries 
anyway, so why not just introduce a new keyword instead? Something like:
    WITH FENCED a (SELECT ...)

But I think something like that was proposed not too long ago, and did 
not make it for some reason.

There's a lot of other CTE improvements that would be great. Say, being 
able to define indexes on them, but that's really a separate topic.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CTE inlining

From

David Fetter

Date:

01 May 2017, 01:25:23

On Sun, Apr 30, 2017 at 11:54:48PM +0200, Tomas Vondra wrote:
> On 04/30/2017 06:28 AM, Tom Lane wrote:
> > Craig Ringer <craig.ringer@2ndquadrant.com> writes:
> > > - as you noted, it is hard to decide when it's worth inlining vs
> > > materializing for CTE terms referenced more than once.
> > 
> > [ raised eyebrow... ]  Please explain why the answer isn't trivially
> > "never".
> > 
> > There's already a pretty large hill to climb here in the way of
> > breaking peoples' expectations about CTEs being optimization
> > fences.  Breaking the documented semantics about CTEs being
> > single-evaluation seems to me to be an absolute non-starter.
> > 
> 
> I'm not sure that's a universal expectation, though. I know there
> are people who actually do rely on that intentionally, no doubt
> about that. And we'd nee to make it work for them.
> 
> But I keep running into people who face serious performance issues
> exactly because not realizing this, and using CTEs as named
> subqueries. And when I tell them "optimization fence" they react
> "Whaaaaaaat?"
> 
> If I had to make up some numbers, I'd say the "Whaaaaat?" group is
> about 10x the group of people who intentionally rely on CTEs being
> optimization fences.

I suspect you're off by at least a couple of orders of magnitude here,
which make this even more important to deal with.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: [HACKERS] CTE inlining

From

Andres Freund

Date:

01 May 2017, 02:15:30

On 2017-04-30 11:34:48 +0300, Ilya Shkuratov wrote:
> Also, I would like to remind that the disabling optimization fence is suggested
> to be OPTIONAL.
> So we don't break peoples' expectations, nor documented semantics.

I think however is that that's not good enough, because it'll surprise
people forever, whereas using something as an intentional barrier
usually is, well, intentional.

Greetings,

Andres Freund

Re: [HACKERS] CTE inlining

From

Andres Freund

Date:

01 May 2017, 02:21:33

On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote:
> why we cannot to introduce GUC option - enable_cteoptfence ?

Doesn't really solve the issue, and we've generally shied away from GUCs
that influence behaviour after a few bad experiences.  What if you want
one CTE inlined, but another one not?

- Andres

Re: [HACKERS] CTE inlining

From

Pavel Stehule

Date:

01 May 2017, 07:22:50

2017-05-01 1:21 GMT+02:00 Andres Freund <andres@anarazel.de>:

On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote:
> why we cannot to introduce GUC option - enable_cteoptfence ?

Doesn't really solve the issue, and we've generally shied away from GUCs
that influence behaviour after a few bad experiences. What if you want
one CTE inlined, but another one not?

It change behave in same sense like enable_nestloop, enable_hashjoin, ... with same limits.

Regards

Pavel

- Andres

Re: [HACKERS] CTE inlining

From

Tomas Vondra

Date:

01 May 2017, 16:05:06

On 05/01/2017 06:22 AM, Pavel Stehule wrote:
>
>
> 2017-05-01 1:21 GMT+02:00 Andres Freund <andres@anarazel.de
> <mailto:andres@anarazel.de>>:
>
>     On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote:
>     > why we cannot to introduce GUC option - enable_cteoptfence ?
>
>     Doesn't really solve the issue, and we've generally shied away from GUCs
>     that influence behaviour after a few bad experiences.  What if you want
>     one CTE inlined, but another one not?
>
>
> It change behave in same sense like enable_nestloop, enable_hashjoin,
> ... with same limits.
>

Those (and also the other enable_*) GUCs are a great example why we 
should not use GUCs for tweaking planner behavior, except perhaps for 
the purpose of investigation. It's an extremely blunt tool.

You typically want to affect just a single node in the query plan (say, 
one join), but those options don't allow you to do that. It's all or 
nothing thing.

Even if you're OK with affecting the whole query, it's a separate 
control channel - it's not embedded in the query, the user has to set it 
somehow. So you either set it for the whole session (affecting all the 
other queries that don't really need it), or you set it before each 
query. Which however sucks for a number of reasons, e.g. if you have a 
slow query in the log, how do you know with what GUC values it was 
executed? (You don't, and there's no way to find out.)

Exactly the same issues would affect this new GUC. It would be 
impossible to use multiple CTEs in the query with different fencing 
behavior, and it would be just as difficult to investigate.

So no more planner-affecting GUCs, please, particularly if we expect 
regular users to use them.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CTE inlining

From

Andrew Dunstan

Date:

01 May 2017, 16:22:42


On 05/01/2017 09:05 AM, Tomas Vondra wrote:
> On 05/01/2017 06:22 AM, Pavel Stehule wrote:
>>
>>
>> 2017-05-01 1:21 GMT+02:00 Andres Freund <andres@anarazel.de
>> <mailto:andres@anarazel.de>>:
>>
>>     On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote:
>>     > why we cannot to introduce GUC option - enable_cteoptfence ?
>>
>>     Doesn't really solve the issue, and we've generally shied away
>> from GUCs
>>     that influence behaviour after a few bad experiences.  What if
>> you want
>>     one CTE inlined, but another one not?
>>
>>
>> It change behave in same sense like enable_nestloop, enable_hashjoin,
>> ... with same limits.
>>
>
> Those (and also the other enable_*) GUCs are a great example why we
> should not use GUCs for tweaking planner behavior, except perhaps for
> the purpose of investigation. It's an extremely blunt tool.
>
> You typically want to affect just a single node in the query plan
> (say, one join), but those options don't allow you to do that. It's
> all or nothing thing.
>
> Even if you're OK with affecting the whole query, it's a separate
> control channel - it's not embedded in the query, the user has to set
> it somehow. So you either set it for the whole session (affecting all
> the other queries that don't really need it), or you set it before
> each query. Which however sucks for a number of reasons, e.g. if you
> have a slow query in the log, how do you know with what GUC values it
> was executed? (You don't, and there's no way to find out.)
>
> Exactly the same issues would affect this new GUC. It would be
> impossible to use multiple CTEs in the query with different fencing
> behavior, and it would be just as difficult to investigate.
>
> So no more planner-affecting GUCs, please, particularly if we expect
> regular users to use them.
>
>


+1

I still see users wanting to use the enable_foo settings in production.

Having had years of telling users that CTEs are an optimization fence it
doesn't seem at all nice for us to turn around and change our mind about
that. I have relied on it in the past and I'm sure I'm very far from
alone in that.

Maybe we could allow a "decorator" that would tell the planner the CTE
could be inlined?
   WITH INLINE mycte AS ( ...)

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CTE inlining

From

Craig Ringer

Date:

01 May 2017, 16:46:02

On 1 May 2017 at 21:22, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:

> Having had years of telling users that CTEs are an optimization fence it
> doesn't seem at all nice for us to turn around and change our mind about
> that. I have relied on it in the past and I'm sure I'm very far from
> alone in that.
>
> Maybe we could allow a "decorator" that would tell the planner the CTE
> could be inlined?
>
>     WITH INLINE mycte AS ( ...)

I'd rather reverse that so we behave like other implementations by
default, and have extension syntax for our no-inline query hint. And
yes, that's what it is, because we'd only inline when we could produce
semantically equivalent results anyway.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] CTE inlining

From

David Fetter

Date:

01 May 2017, 17:17:34

On Mon, May 01, 2017 at 09:22:42AM -0400, Andrew Dunstan wrote:
> > So no more planner-affecting GUCs, please, particularly if we expect
> > regular users to use them.
> 
> +1
> 
> I still see users wanting to use the enable_foo settings in production.
> 
> Having had years of telling users that CTEs are an optimization fence it
> doesn't seem at all nice for us to turn around and change our mind about
> that. I have relied on it in the past and I'm sure I'm very far from
> alone in that.

You are certainly not alone, but I believe that in this you're missing
the vast majority (we hope) of PostgreSQL users.  These are the users
who have yet to adopt PostgreSQL, and have the quite reasonable
expectation that ordinary-looking grammar *isn't* an optimization
fence.

> Maybe we could allow a "decorator" that would tell the planner the CTE
> could be inlined?
> 
>     WITH INLINE mycte AS ( ...)

+1 for a decorator, -1 for this one.

We already have an explicit optimization fence with OFFSET 0, and I
think making optimization fences explicit is how we should continue.
I'd be more in favor of something along the lines of
   WITH FENCED        /* Somewhat fuzzy.  What fence? */   or   WITH AT_MOST_ONCE  /* Clearer, but not super precise */
 or   WITH UNIQUE_ATOMIC /* More descriptive, but not super clear without the docs in hand */

or something along that line.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: [HACKERS] CTE inlining

From

Andreas Karlsson

Date:

01 May 2017, 17:26:22

On 05/01/2017 04:17 PM, David Fetter wrote:
>> Maybe we could allow a "decorator" that would tell the planner the CTE
>> could be inlined?
>>
>>     WITH INLINE mycte AS ( ...)
>
> +1 for a decorator, -1 for this one.

I am not sure I like decorators since this means adding an ad hoc query 
hint directly into the SQL syntax which is something which I requires 
serious consideration.

> We already have an explicit optimization fence with OFFSET 0, and I
> think making optimization fences explicit is how we should continue.
> I'd be more in favor of something along the lines of
>
>     WITH FENCED        /* Somewhat fuzzy.  What fence? */
>     or
>     WITH AT_MOST_ONCE  /* Clearer, but not super precise */
>     or
>     WITH UNIQUE_ATOMIC /* More descriptive, but not super clear without the docs in hand */
>
> or something along that line.

What about WITH MATERIALIZED, borrowing from the MySQL terminology 
"materialized subquery"?

Andreas

Re: [HACKERS] CTE inlining

From

"David G. Johnston"

Date:

01 May 2017, 17:33:26

On Mon, May 1, 2017 at 7:26 AM, Andreas Karlsson <andreas@proxel.se> wrote:

On 05/01/2017 04:17 PM, David Fetter wrote:
Maybe we could allow a "decorator" that would tell the planner the CTE
could be inlined?

WITH INLINE mycte AS ( ...)

+1 for a decorator, -1 for this one.

I am not sure I like decorators since this means adding an ad hoc query hint directly into the SQL syntax which is something which I requires serious consideration.

Given that we already have

"

prevent optimization

"

syntax why do we need a decorator on the CTE?

We already have an explicit optimization fence with OFFSET 0, and I
think making optimization fences explicit is how we should continue.
I'd be more in favor of something along the lines of

WITH FENCED /* Somewhat fuzzy. What fence? */
or
WITH AT_MOST_ONCE /* Clearer, but not super precise */
or
WITH UNIQUE_ATOMIC /* More descriptive, but not super clear without the docs in hand */

or something along that line.

What about WITH MATERIALIZED, borrowing from the MySQL terminology "materialized subquery"?

I would shorten that to "WITH MAT" except that I don't think that having two way to introduce an optimization fence is worthwhile.

If we don't optimize SRFs-in-target-list, and thus avoid multiple function evaluation for (composite_col).*, I believe a significant number of intentional optimization fence uses will be safe but behavioral changes.

David J.

Re: [HACKERS] CTE inlining

From

Andreas Karlsson

Date:

01 May 2017, 17:43:11

On 05/01/2017 04:33 PM, David G. Johnston wrote:> On Mon, May 1, 2017 at 7:26 AM, Andreas Karlsson <andreas@proxel.se>
  I am not sure I like decorators since this means adding an ad hoc>     query hint directly into the SQL syntax which
issomething which I>     requires serious consideration.>> Given that we already have> "> prevent optimization> ">
syntaxwhy do we need a decorator on the CTE?
 

I do not think I follow. Me and some other people here would ideally 
allow CTEs to be inlined by default. Some people today use CTEs as 
optimization fences, to for example control join order, and the 
suggestion here is to add new syntax for CTEs to allow them to 
selectively be used as optimization fences.
> I would shorten that to "WITH MAT" except that I don't think that> having two way to introduce an optimization fence
isworthwhile.
 

You mean OFFSET 0? I have never been a fan of using it as an 
optimization fence. I do not think OFFSET 0 conveys clearly enough to 
the reader that is is an optimization fence.

Andreas

Re: [HACKERS] CTE inlining

From

Corey Huinker

Date:

01 May 2017, 17:44:44

On Mon, May 1, 2017 at 10:26 AM, Andreas Karlsson <andreas@proxel.se> wrote:

What about WITH MATERIALIZED, borrowing from the MySQL terminology "materialized subquery"?

+1, you beat me to it.

Re: [HACKERS] CTE inlining

From

"David G. Johnston"

Date:

01 May 2017, 17:53:06

On Mon, May 1, 2017 at 7:43 AM, Andreas Karlsson <andreas@proxel.se> wrote:

On 05/01/2017 04:33 PM, David G. Johnston wrote:
> On Mon, May 1, 2017 at 7:26 AM, Andreas Karlsson <andreas@proxel.se
> I am not sure I like decorators since this means adding an ad hoc
> query hint directly into the SQL syntax which is something which I
> requires serious consideration.

> I would shorten that to "WITH MAT" except that I don't think that
> having two way to introduce an optimization fence is worthwhile.

You mean OFFSET 0? I have never been a fan of using it as an optimization fence. I do not think OFFSET 0 conveys clearly enough to the reader that is is an optimization fence.

I think I was being too literal in my thinking. Proposing that we effectively do away with OFFSET 0 and instead recommend "WITH MAT name AS ()" for subqueries requiring standalone evaluation is something I can agree with. I too have always though of OFFSET 0 as hack-ish which is why my SRF usage examples have always used CTEs.

David J.

Re: [HACKERS] CTE inlining

From

Andrew Dunstan

Date:

01 May 2017, 18:04:47


On 05/01/2017 10:17 AM, David Fetter wrote:
> On Mon, May 01, 2017 at 09:22:42AM -0400, Andrew Dunstan wrote:
>>> So no more planner-affecting GUCs, please, particularly if we expect
>>> regular users to use them.
>> +1
>>
>> I still see users wanting to use the enable_foo settings in production.
>>
>> Having had years of telling users that CTEs are an optimization fence it
>> doesn't seem at all nice for us to turn around and change our mind about
>> that. I have relied on it in the past and I'm sure I'm very far from
>> alone in that.
> You are certainly not alone, but I believe that in this you're missing
> the vast majority (we hope) of PostgreSQL users.  These are the users
> who have yet to adopt PostgreSQL, and have the quite reasonable
> expectation that ordinary-looking grammar *isn't* an optimization
> fence.
>



I am not in favor of seriously inconveniencing a significant number of
our existing users for the sake of a bunch of people who don't now and
might never in the future use Postgres. I think the bar for silent
behaviour changes needs to be a bit higher than this one is.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CTE inlining

From

Craig Ringer

Date:

02 May 2017, 05:38:02

On 1 May 2017 at 22:26, Andreas Karlsson <andreas@proxel.se> wrote:

> I am not sure I like decorators since this means adding an ad hoc query hint
> directly into the SQL syntax which is something which I requires serious
> consideration.

And mangling the semantics of existing syntax doesn't?

That's what we do right now so we can pretend we don't have query
hints while still having query hints.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] CTE inlining

From

Robert Haas

Date:

02 May 2017, 05:42:18

On Sun, Apr 30, 2017 at 5:54 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> But I keep running into people who face serious performance issues exactly
> because not realizing this, and using CTEs as named subqueries. And when I
> tell them "optimization fence" they react "Whaaaaaaat?"
>
> If I had to make up some numbers, I'd say the "Whaaaaat?" group is about 10x
> the group of people who intentionally rely on CTEs being optimization
> fences.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CTE inlining

From

Craig Ringer

Date:

02 May 2017, 05:45:40

On 1 May 2017 at 21:05, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> On 05/01/2017 06:22 AM, Pavel Stehule wrote:
>>
>>
>>
>> 2017-05-01 1:21 GMT+02:00 Andres Freund <andres@anarazel.de
>> <mailto:andres@anarazel.de>>:
>>
>>     On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote:
>>     > why we cannot to introduce GUC option - enable_cteoptfence ?
>>
>>     Doesn't really solve the issue, and we've generally shied away from
>> GUCs
>>     that influence behaviour after a few bad experiences.  What if you
>> want
>>     one CTE inlined, but another one not?
>>
>>
>> It change behave in same sense like enable_nestloop, enable_hashjoin,
>> ... with same limits.
>>
>
> Those (and also the other enable_*) GUCs are a great example why we should
> not use GUCs for tweaking planner behavior, except perhaps for the purpose
> of investigation. It's an extremely blunt tool.

Definitely agree. You can't use them to affect only a portion of a
query. In fact there's no way to automatically scope them to one query
at all. They're also very big hammers, and as we introduce new types
of plan nodes they won't offer comprehensive control without being
amended. They're a tool of last resort for dealing with problems.

> Exactly the same issues would affect this new GUC. It would be impossible to
> use multiple CTEs in the query with different fencing behavior, and it would
> be just as difficult to investigate.

Yeah, I think a GUC is a non-starter.

If we want fence behaviour, we should require people to declare their
desire for fence behaviour, rather than treating it as a sort of
hint-as-a-bug that we grandfather in because we're so desperate not to
admit we have hints.

Before forming a strong view on this, I strongly everyone stick your
head outside the postgresql.org lists for a while. In my experience
even regular postgres users I see pop up on places like Stack Overflow
tend to react to this behaviour with "WTF?!" and wonder why we haven't
fixed this limitation yet, viewing it as a bug not a feature.

The same logic being applied here should've prevented us from ever introducing:

* inlining of SQL functions
* inlining of views
* inlining of subqueries

... but somehow, this one is different.

We're not even removing functionality. You can still use the OFFSET 0
hack. If you need a nonzero offset that's fine, because we don't
inline over OFFSET anyway.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] CTE inlining

From

Craig Ringer

Date:

02 May 2017, 09:23:26

On 2 May 2017 at 10:45, Craig Ringer <craig@2ndquadrant.com> wrote:

> If we want fence behaviour, we should require people to declare their
> desire for fence behaviour, rather than treating it as a sort of
> hint-as-a-bug that we grandfather in because we're so desperate not to
> admit we have hints.

Answers to my question here https://dba.stackexchange.com/q/27425/7788
suggest that there's no justification in the standard for treating it
as a mandatory fence.

Here https://www.postgresql.org/message-id/29918.1320244719@sss.pgh.pa.us
Tom says

"CTEs are also treated as optimization fences; this is not so much an
optimizer limitation as to keep the semantics sane when the CTE contains
a writable query."

In my view, if we address that by walking the query tree for volatile
functions, we should be perfectly fine to inline single-reference
CTEs. Is that too simplistic?

Is the sticking point simply this docs text:

"A useful property of WITH queries is that they are evaluated only
once per execution of the parent query, even if they are referred to
more than once by the parent query or sibling WITH queries. Thus,
expensive calculations that are needed in multiple places can be
placed within a WITH query to avoid redundant work. Another possible
application is to prevent unwanted multiple evaluations of functions
with side-effects. However, the other side of this coin is that the
optimizer is less able to push restrictions from the parent query down
into a WITH query than an ordinary subquery. The WITH query will
generally be evaluated as written, without suppression of rows that
the parent query might discard afterwards. (But, as mentioned above,
evaluation might stop early if the reference(s) to the query demand
only a limited number of rows.)"

?

... because to me, that seems to offer a great deal of wiggle room in
"less able", "generally", etc, even if we treated the docs as a hard
specification of future behaviour. (Which we don't, since we change
things and document the changes).

We may also stop evaluation early, so there's already no guarantee the
full query is run.

Is there any *technical* rather than policy reason we could not safely
inline a CTE term referenced by only one query, where the CTE term is
a SELECT, contains no volatile function calls, is not RECURSIVE, and
does not its self reference another non-inlineable CTE term?

Josh Kupershmidt, in a comment on my blog post on this topic some time
ago (https://blog.2ndquadrant.com/postgresql-ctes-are-optimization-fences/),
pointed out that

"One notable exception to the “(non-writeable) CTEs are always
materialized” rule is that if a non-writeable CTE node is not
referenced anywhere, it won’t actually be evaluated. For example, this
query returns 1 instead of bombing out:

WITH not_executed AS (SELECT 1/0),
    executed AS (SELECT 1)

SELECT * FROM executed;"

Prior discussions:

* https://www.postgresql.org/message-id/201209191305.44674.db@kavod.com
* http://archives.postgresql.org/pgsql-performance/2011-10/msg00208.php
* https://www.postgresql.org/message-id/29918.1320244719@sss.pgh.pa.us

Relevant posts where users get confused by our behaviour:

* https://dba.stackexchange.com/q/127828/7788
* https://news.ycombinator.com/item?id=7023907
* https://dba.stackexchange.com/q/84760/7788
* https://dba.stackexchange.com/q/97393/7788
* http://stackoverflow.com/q/20403792/398670
* http://stackoverflow.com/q/33731068/398670
* ....

Also, comments on my post:

* https://blog.2ndquadrant.com/postgresql-ctes-are-optimization-fences/

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] CTE inlining

From

Andreas Karlsson

Date:

02 May 2017, 15:40:55

On 05/02/2017 04:38 AM, Craig Ringer wrote:
> On 1 May 2017 at 22:26, Andreas Karlsson <andreas@proxel.se> wrote:
>> I am not sure I like decorators since this means adding an ad hoc query hint
>> directly into the SQL syntax which is something which I requires serious
>> consideration.
>
> And mangling the semantics of existing syntax doesn't?
>
> That's what we do right now so we can pretend we don't have query
> hints while still having query hints.

I am in favor of removing the optimization fence from CTEs, and strongly 
prefer no fence being the default behavior since SQL is a declarative 
language and I think it is reasonable to assume that CTEs can be 
inlined. But the question is how to best remove the fence while taking 
into account that quite many use them as optimization fences today.

I see some alternatives, none of them perfect.

1. Just remove the optimization fence and let people add OFFSET 0 to 
their queries if they want an optimization fence. This lets us keep 
pretending that we do not have query hints (and therefore do not have to 
formalize any syntax for them) while still allowing people to add 
optimization fences.

2. Add a decorator for WITH (e.g. "WITH MATERIALIZED x (...") to add an 
explicit optimization fence. This will for the first time add official 
support for a query hint in the syntax which is a quite big precedent.

3. Add a new GUC which can enable and disable the optimization fence. 
This is a very clumsy tool, but maybe good enough for some users and 
some people here in this thread have complained about our similar GUCs.

4. Add some new more generic query hinting facility. This is a lot of 
work and something which would be very hard to get consensus for.

Andreas

Re: [HACKERS] CTE inlining

From

Oleg Bartunov

Date:

02 May 2017, 16:06:08

On Mon, May 1, 2017 at 7:22 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>
> 2017-05-01 1:21 GMT+02:00 Andres Freund <andres@anarazel.de>:
>>
>> On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote:
>> > why we cannot to introduce GUC option - enable_cteoptfence ?
>>
>> Doesn't really solve the issue, and we've generally shied away from GUCs
>> that influence behaviour after a few bad experiences.  What if you want
>> one CTE inlined, but another one not?
>
>
> It change behave in same sense like enable_nestloop, enable_hashjoin, ...
> with same limits.

And then we recall  plan hints :)

>
> Regards
>
> Pavel
>
>>
>>
>> - Andres
>
>

Re: [HACKERS] CTE inlining

From

Merlin Moncure

Date:

02 May 2017, 17:13:08

On Sun, Apr 30, 2017 at 6:21 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote:
>> why we cannot to introduce GUC option - enable_cteoptfence ?
>
> Doesn't really solve the issue, and we've generally shied away from GUCs
> that influence behaviour after a few bad experiences.  What if you want
> one CTE inlined, but another one not?

Yeah.  Are we absolutely opposed to SQL syntax against WITH that
allows or disallows fencing?   for example,

WITH [MATERIALIZED]

Pushing people to OFFSET 0 is a giant step backwards IMO, and as in
implementation detail is also subject to change.

merlin

Re: [HACKERS] CTE inlining

From

Andrew Dunstan

Date:

02 May 2017, 17:44:33


On 05/02/2017 10:13 AM, Merlin Moncure wrote:
> On Sun, Apr 30, 2017 at 6:21 PM, Andres Freund <andres@anarazel.de> wrote:
>> On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote:
>>> why we cannot to introduce GUC option - enable_cteoptfence ?
>> Doesn't really solve the issue, and we've generally shied away from GUCs
>> that influence behaviour after a few bad experiences.  What if you want
>> one CTE inlined, but another one not?
> Yeah.  Are we absolutely opposed to SQL syntax against WITH that
> allows or disallows fencing?   for example,
>
> WITH [MATERIALIZED]
>
> Pushing people to OFFSET 0 is a giant step backwards IMO, and as in
> implementation detail is also subject to change.
>
>



Agreed, it's an ugly as sin and completely non-obvious hack.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CTE inlining

From

Tomas Vondra

Date:

02 May 2017, 18:43:45

On 5/2/17 4:44 PM, Andrew Dunstan wrote:
> 
> 
> On 05/02/2017 10:13 AM, Merlin Moncure wrote:
>> On Sun, Apr 30, 2017 at 6:21 PM, Andres Freund <andres@anarazel.de> wrote:
>>> On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote:
>>>> why we cannot to introduce GUC option - enable_cteoptfence ?
>>> Doesn't really solve the issue, and we've generally shied away from GUCs
>>> that influence behaviour after a few bad experiences.  What if you want
>>> one CTE inlined, but another one not?
>> Yeah.  Are we absolutely opposed to SQL syntax against WITH that
>> allows or disallows fencing?   for example,
>>
>> WITH [MATERIALIZED]
>>
>> Pushing people to OFFSET 0 is a giant step backwards IMO, and as in
>> implementation detail is also subject to change.
>>
>>
> 
> Agreed, it's an ugly as sin and completely non-obvious hack.
> 

Isn't OFFSET 0 an implementation detail anyway? Who says the planner 
couldn't get smarter in the future, realize OFFSET 0 is no-op?

In that case replacing CTE optimization fence with "OFFSET 0" would be 
akin to painting yourself into a corner, waiting for the pain to dry, 
walking over to another corner and painting yourself into that one.

cheers

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CTE inlining

From

Thomas Kellerer

Date:

02 May 2017, 19:01:14

> Relevant posts where users get confused by our behaviour:
> ....

And Markus Winand's blog: http://modern-sql.com/feature/with/performance
   Databases generally obey this principle, although PostgreSQL represents
a big exception 

and 
   Besides PostgreSQL, all tested databases optimize with clauses in the
same way    that they optimize views or derived tables"






--
View this message in context: http://www.postgresql-archive.org/CTE-inlining-tp5958992p5959313.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] CTE inlining

From

David Fetter

Date:

02 May 2017, 19:34:07

On Tue, May 02, 2017 at 02:40:55PM +0200, Andreas Karlsson wrote:
> On 05/02/2017 04:38 AM, Craig Ringer wrote:
> > On 1 May 2017 at 22:26, Andreas Karlsson <andreas@proxel.se> wrote:
> > > I am not sure I like decorators since this means adding an ad hoc query hint
> > > directly into the SQL syntax which is something which I requires serious
> > > consideration.
> > 
> > And mangling the semantics of existing syntax doesn't?
> > 
> > That's what we do right now so we can pretend we don't have query
> > hints while still having query hints.
> 
> I am in favor of removing the optimization fence from CTEs, and strongly
> prefer no fence being the default behavior since SQL is a declarative
> language and I think it is reasonable to assume that CTEs can be inlined.
> But the question is how to best remove the fence while taking into account
> that quite many use them as optimization fences today.
> 
> I see some alternatives, none of them perfect.
> 
> 1. Just remove the optimization fence and let people add OFFSET 0 to their
> queries if they want an optimization fence. This lets us keep pretending
> that we do not have query hints (and therefore do not have to formalize any
> syntax for them) while still allowing people to add optimization fences.

+1

I get that people with gigantic PostgreSQL installations with
stringent performance requirements sometimes need to do odd things to
squeeze out the last few percentage points of performance.  As the
people (well, at least the people close to the ground) at these
organizations are fully aware, performance optimizations are extremely
volatile with respect to new versions of software, whether it's
PostgreSQL, Oracle, the Linux kernel, or what have you.  They expect
this, and they have processes in place to handle it.  If they don't,
it's pilot error.

We should not be penalizing all our other users to maintain the
fiction that people can treat performance optimizations as a "fire and
forget" matter.

> 2. Add a decorator for WITH (e.g. "WITH MATERIALIZED x (...") to add an
> explicit optimization fence. This will for the first time add official
> support for a query hint in the syntax which is a quite big precedent.

Yep.  It's one we should think very carefully before we introduce.

> 3. Add a new GUC which can enable and disable the optimization fence. This
> is a very clumsy tool, but maybe good enough for some users and some people
> here in this thread have complained about our similar GUCs.

Any GUC would be unable to distinguish one WITH clause from another.
The hammer would then be guaranteed to be too big for precisely the
cases where it's most needed.

> 4. Add some new more generic query hinting facility. This is a lot
> of work and something which would be very hard to get consensus for.

Just the design of the thing would be the work of months at a minimum,
assuming we got to some consensus at all.  Maybe it's worth doing.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: [HACKERS] CTE inlining

From

Corey Huinker

Date:

02 May 2017, 19:51:35

I get that people with gigantic PostgreSQL installations with
stringent performance requirements sometimes need to do odd things to
squeeze out the last few percentage points of performance. As the
people (well, at least the people close to the ground) at these
organizations are fully aware, performance optimizations are extremely
volatile with respect to new versions of software, whether it's
PostgreSQL, Oracle, the Linux kernel, or what have you. They expect
this, and they have processes in place to handle it. If they don't,
it's pilot error.

Well put. People on the ground in those situations go to great lengths to freeze the query plan as-is. For them, an upgrade is something that is done after months of planning. They might be surprised by the dropping of this optimization fence, but the surprise won't be in production, and they've got just as good of chance of being pleasantly surprised.

> 2. Add a decorator for WITH (e.g. "WITH MATERIALIZED x (...") to add an
> explicit optimization fence. This will for the first time add official
> support for a query hint in the syntax which is a quite big precedent.

Yep. It's one we should think very carefully before we introduce.

There's a tiny, oblique precedence for this with Oracle's WITH [FUNCTION | PROCEDURE] syntax. In both cases, the user is able to create an ephemeral object that can be referenced later in the query. Good idea or bad, it's a sign that others have been fine with that conceptual path.

Personally, I'm fine with WITH MATERIALIZED, but I'm also ok with just not promising the fence. I think there is value in letting users break up a complex query into understandable WITH-chunks, and that value shouldn't prevent good performance. The fence will probably still be there anyway in the case of INSERT/UPDATE RETURNING and cases where a CTE is referenced more than once in the query that follows.

Re: [HACKERS] CTE inlining

From

Tomas Vondra

Date:

02 May 2017, 20:05:10

On 5/2/17 6:34 PM, David Fetter wrote:
> On Tue, May 02, 2017 at 02:40:55PM +0200, Andreas Karlsson wrote:
>> On 05/02/2017 04:38 AM, Craig Ringer wrote:
>>> On 1 May 2017 at 22:26, Andreas Karlsson <andreas@proxel.se> wrote:>>
>> ...
>>
>> I see some alternatives, none of them perfect.
>>
>> 1. Just remove the optimization fence and let people add OFFSET 0 to their
>> queries if they want an optimization fence. This lets us keep pretending
>> that we do not have query hints (and therefore do not have to formalize any
>> syntax for them) while still allowing people to add optimization fences.
> 
> +1
> 
> I get that people with gigantic PostgreSQL installations with
> stringent performance requirements sometimes need to do odd things to
> squeeze out the last few percentage points of performance.  As the
> people (well, at least the people close to the ground) at these
> organizations are fully aware, performance optimizations are extremely
> volatile with respect to new versions of software, whether it's
> PostgreSQL, Oracle, the Linux kernel, or what have you.  They expect
> this, and they have processes in place to handle it.  If they don't,
> it's pilot error.
> 
> We should not be penalizing all our other users to maintain the
> fiction that people can treat performance optimizations as a "fire and
> forget" matter.
> 

Agreed.

>> 2. Add a decorator for WITH (e.g. "WITH MATERIALIZED x (...") to add an
>> explicit optimization fence. This will for the first time add official
>> support for a query hint in the syntax which is a quite big precedent.
> 
> Yep.  It's one we should think very carefully before we introduce.
> 

I think it's a mistake to see this as an introduction of query hits.

Firstly, it's a question whether it qualifies as a hint. I wouldn't call 
it a hint, but let's assume there is a definition of query hints that 
includes WITH MATERIALIZED.

More importantly, however, this is not introducing anything new. It's 
just a different name for the current "WITH" semantics, and you can 
achieve the same behavior by "OFFSET 0". And people are already using 
these as hints, so I fail to see how this introduces anything new.

In fact, if you see the optimization fence as an implicit query hint, 
this actually *removes* a hint (although most users are unaware of that 
behavior and use it unintentionally).


>> 3. Add a new GUC which can enable and disable the optimization fence. This
>> is a very clumsy tool, but maybe good enough for some users and some people
>> here in this thread have complained about our similar GUCs.
> 
> Any GUC would be unable to distinguish one WITH clause from another.
> The hammer would then be guaranteed to be too big for precisely the
> cases where it's most needed.
> 

If I could, I'd give -1 million to a GUC-based approach, as that would 
make it entirely unusable in practice, I think.

Actually, I can give -1 million, so I'm giving it.
>>
>> 4. Add some new more generic query hinting facility. This is a lot
>> of work and something which would be very hard to get consensus for.
> 
> Just the design of the thing would be the work of months at a minimum,
> assuming we got to some consensus at all.  Maybe it's worth doing.
> 

While I came to conclusion that query hints may be quite useful in some 
situations, I'm pretty sure this is not a battle you'd like to fight.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CTE inlining

From

Merlin Moncure

Date:

03 May 2017, 00:23:16

\On Tue, May 2, 2017 at 12:05 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> On 5/2/17 6:34 PM, David Fetter wrote:
>>
>> On Tue, May 02, 2017 at 02:40:55PM +0200, Andreas Karlsson wrote:
>>>
>>> On 05/02/2017 04:38 AM, Craig Ringer wrote:
>>>>
>>>> On 1 May 2017 at 22:26, Andreas Karlsson <andreas@proxel.se> wrote:
>
>>>
>>>
>>> ...
>>>
>>> I see some alternatives, none of them perfect.
>>>
>>> 1. Just remove the optimization fence and let people add OFFSET 0 to
>>> their
>>> queries if they want an optimization fence. This lets us keep pretending
>>> that we do not have query hints (and therefore do not have to formalize
>>> any
>>> syntax for them) while still allowing people to add optimization fences.
>>
>>
>> +1
>>
>> I get that people with gigantic PostgreSQL installations with
>> stringent performance requirements sometimes need to do odd things to
>> squeeze out the last few percentage points of performance.  As the
>> people (well, at least the people close to the ground) at these
>> organizations are fully aware, performance optimizations are extremely
>> volatile with respect to new versions of software, whether it's
>> PostgreSQL, Oracle, the Linux kernel, or what have you.  They expect
>> this, and they have processes in place to handle it.  If they don't,
>> it's pilot error.
>>
>> We should not be penalizing all our other users to maintain the
>> fiction that people can treat performance optimizations as a "fire and
>> forget" matter.
>>
>
> Agreed.
>
>>> 2. Add a decorator for WITH (e.g. "WITH MATERIALIZED x (...") to add an
>>> explicit optimization fence. This will for the first time add official
>>> support for a query hint in the syntax which is a quite big precedent.
>>
>>
>> Yep.  It's one we should think very carefully before we introduce.
>>
>
> I think it's a mistake to see this as an introduction of query hits.
>
> Firstly, it's a question whether it qualifies as a hint. I wouldn't call it
> a hint, but let's assume there is a definition of query hints that includes
> WITH MATERIALIZED.
>
> More importantly, however, this is not introducing anything new. It's just a
> different name for the current "WITH" semantics, and you can achieve the
> same behavior by "OFFSET 0". And people are already using these as hints, so
> I fail to see how this introduces anything new.
>
> In fact, if you see the optimization fence as an implicit query hint, this
> actually *removes* a hint (although most users are unaware of that behavior
> and use it unintentionally).

+1 down the line.  More to the point, for several years now we've (or
at least I, but I'm not the only one) have been advocating for the
usage of CTE to avoid the undocumented and bizarre OFFSET 0 trick.
Jerking this out from users without giving a simple mechanic to get
the same behavior minus a major query rewrite is blatantly user
hostile.  I can't believe we're even contemplating it.   Also a GUC is
not a solution for pretty obvious reasons I think.

merlin

Re: [HACKERS] CTE inlining

From

Tomas Vondra

Date:

03 May 2017, 02:00:38


On 5/2/17 11:23 PM, Merlin Moncure wrote:
> \On Tue, May 2, 2017 at 12:05 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> On 5/2/17 6:34 PM, David Fetter wrote:
>>>
>>> On Tue, May 02, 2017 at 02:40:55PM +0200, Andreas Karlsson wrote:
>>>>
>>>> On 05/02/2017 04:38 AM, Craig Ringer wrote:
>>>>>
>>>>> On 1 May 2017 at 22:26, Andreas Karlsson <andreas@proxel.se> wrote:
>>
>>>>
>>>>
>>>> ...
>>>>
>>>> I see some alternatives, none of them perfect.
>>>>
>>>> 1. Just remove the optimization fence and let people add OFFSET 0 to
>>>> their
>>>> queries if they want an optimization fence. This lets us keep pretending
>>>> that we do not have query hints (and therefore do not have to formalize
>>>> any
>>>> syntax for them) while still allowing people to add optimization fences.
>>>
>>>
>>> +1
>>>
>>> I get that people with gigantic PostgreSQL installations with
>>> stringent performance requirements sometimes need to do odd things to
>>> squeeze out the last few percentage points of performance.  As the
>>> people (well, at least the people close to the ground) at these
>>> organizations are fully aware, performance optimizations are extremely
>>> volatile with respect to new versions of software, whether it's
>>> PostgreSQL, Oracle, the Linux kernel, or what have you.  They expect
>>> this, and they have processes in place to handle it.  If they don't,
>>> it's pilot error.
>>>
>>> We should not be penalizing all our other users to maintain the
>>> fiction that people can treat performance optimizations as a "fire and
>>> forget" matter.
>>>
>>
>> Agreed.
>>
>>>> 2. Add a decorator for WITH (e.g. "WITH MATERIALIZED x (...") to add an
>>>> explicit optimization fence. This will for the first time add official
>>>> support for a query hint in the syntax which is a quite big precedent.
>>>
>>>
>>> Yep.  It's one we should think very carefully before we introduce.
>>>
>>
>> I think it's a mistake to see this as an introduction of query hits.
>>
>> Firstly, it's a question whether it qualifies as a hint. I wouldn't call it
>> a hint, but let's assume there is a definition of query hints that includes
>> WITH MATERIALIZED.
>>
>> More importantly, however, this is not introducing anything new. It's just a
>> different name for the current "WITH" semantics, and you can achieve the
>> same behavior by "OFFSET 0". And people are already using these as hints, so
>> I fail to see how this introduces anything new.
>>
>> In fact, if you see the optimization fence as an implicit query hint, this
>> actually *removes* a hint (although most users are unaware of that behavior
>> and use it unintentionally).
> 
> +1 down the line.  More to the point, for several years now we've (or
> at least I, but I'm not the only one) have been advocating for the
> usage of CTE to avoid the undocumented and bizarre OFFSET 0 trick.
> Jerking this out from users without giving a simple mechanic to get
> the same behavior minus a major query rewrite is blatantly user
> hostile.  I can't believe we're even contemplating it.   Also a GUC is
> not a solution for pretty obvious reasons I think.
> 

I'm not sure what you mean by "jerking this out from users". Isn't most 
of this thread about how to allow CTE inlining without hurting users 
unnecessarily?

I think we agree that:

* Just removing the optimization fence and telling users to use OFFSET 0 
instead is a no-go, just like removing the fencing and not providing any 
sensible replacement.

* GUC is not the solution.

Which leaves us with either WITH INLINE or WITH MATERIALIZE, or 
something along those lines.

If we go with WITH INLINE then we're likely not solving anything, 
because most people will simply use WITH just like now, and will be 
subject to the fencing without realizing it.

Or we will choose WITH MATERIALIZE, and then the users aware of the 
fencing (and using the CTEs for that purpose) will have to modify the 
queries. But does adding MATERIALIZE quality as major query rewrite?

Perhaps combining this with a GUC would be a solution. I mean, a GUC 
specifying the default behavior, and then INLINE / MATERIALIZE for 
individual CTEs in a query?

If you have an application intentionally using CTEs as a fence, just do
    ALTER DATABASE x SET enable_guc_fencing = on

and you don't have to rewrite the queries.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CTE inlining

From

Craig Ringer

Date:

03 May 2017, 08:19:22

On 3 May 2017 at 07:00, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

> I'm not sure what you mean by "jerking this out from users". Isn't most of
> this thread about how to allow CTE inlining without hurting users
> unnecessarily?

He's referring to

Andreas Karlsson <andreas@proxel.se> wrote:
> 1. Just remove the optimization fence and let people add OFFSET 0 to their queries if they want an optimization
fence.This lets us keep pretending that we do not have query hints (and therefore do not have to formalize any syntax
forthem) while still allowing people to add optimization fences. 

... and the various agreement expressed with it.

I wish that's what had been done in the first place, but it's a bit
harsh on users to do it now.

We can keep living in fantasty-land where we deny hints while telling
people to use a washy-worded semi-documented hint to work around
issues.

Or we can take responsibility and admit we've had to grandfather a
limitation in as a blessed hint. Then unblock our way toward this
performance enhancement / user foot-gun without yanking the rug out
from under users who've been relying on our odd implementation detail
to date:

* Add "WITH MATERIALIZED" or "WITH NO_INLINE" or whatever
* Change "WITH" to inline where safe, initially only for single reference terms
* Mention the change in the release notes

Note that I'm in no way advocating real hints. I won't say I love all
aspects of project policy there, but it's clear there are benefits,
and that hints bring plenty of problems. If this weren't already
documented and widely advocated as a way to "fix" poor
pushdown/pullup/inlining decisions I'd be firmly in the "just fix it"
camp.

> * Just removing the optimization fence and telling users to use OFFSET 0
> instead is a no-go, just like removing the fencing and not providing any
> sensible replacement.

I could tolerate telling people to use OFFSET 0 (and documenting it!)
as a workaround if we can't get something more friendly in.

At least then we're back down to one hint-not-a-hint instead of two,
and it doesn't really block any useful optimisations.

> * GUC is not the solution.

Strong agreement.

> If we go with WITH INLINE then we're likely not solving anything, because
> most people will simply use WITH just like now, and will be subject to the
> fencing without realizing it.

Yes, and we're missing the opportunity to confirm with what other
systems do, and the spirit of the SQL language's declare what I want,
not how to do it, model.

> Or we will choose WITH MATERIALIZE, and then the users aware of the fencing
> (and using the CTEs for that purpose) will have to modify the queries. But
> does adding MATERIALIZE quality as major query rewrite?

Hardly.

> Perhaps combining this with a GUC would be a solution. I mean, a GUC
> specifying the default behavior, and then INLINE / MATERIALIZE for
> individual CTEs in a query?

It'd be nice if we could do that for a couple of releases as an
interim measure, but people will then get locked into relying on it,
and we'll never be able to remove it.

It's not like we don't make users do other things for upgrades, and
don't change performance in other ways. We don't even HAVE a
performance-test farm to look at regressions, planner behaviour
changes, etc. Yes, I know it's hard and nobody's volunteering, the
point is, we're hardly free of undocumented and unintentional
query-specific regressions let alone documented and relnoted ones.

So a sad -1 to me for a GUC.

Anyone big enough to be significantly upset by this planner change
will have a QA/staging deployment system anyway. Or should, because we
make enough other changes in a major release to make their life way
more interesting than this!

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] CTE inlining

From

Andrew Dunstan

Date:

03 May 2017, 16:54:39

On 05/02/2017 07:00 PM, Tomas Vondra wrote:
>
>
>
> I think we agree that:
>
> * Just removing the optimization fence and telling users to use OFFSET
> 0 instead is a no-go, just like removing the fencing and not providing
> any sensible replacement.
>
> * GUC is not the solution.

yes

>
> Which leaves us with either WITH INLINE or WITH MATERIALIZE, or
> something along those lines.
>
> If we go with WITH INLINE then we're likely not solving anything,
> because most people will simply use WITH just like now, and will be
> subject to the fencing without realizing it.

In many cases it won't actually matter much.

We're going to penalize some group of users no matter what we do. It
just seems a pity that it might be the group who actually took us at our
word. It's no skin off my nose - I will happily spend time finding
places where this will make things worse for customers.

>
> Or we will choose WITH MATERIALIZE, and then the users aware of the
> fencing (and using the CTEs for that purpose) will have to modify the
> queries. But does adding MATERIALIZE quality as major query rewrite?

It's not a major rewrite. But I can think of at least one former
customer who will have to go through a heck of a lot of code finding
where to add that one word.

>
> Perhaps combining this with a GUC would be a solution. I mean, a GUC
> specifying the default behavior, and then INLINE / MATERIALIZE for
> individual CTEs in a query?
>
> If you have an application intentionally using CTEs as a fence, just do
>
>     ALTER DATABASE x SET enable_guc_fencing = on
>
> and you don't have to rewrite the queries.
>

ITYM enable_cte_fencing.

I'm not sure it will help all that much unless we provide both decorator
variants, so people can remediate their code one query at a time, and
this guc would just govern the default.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CTE inlining

From

Claudio Freire

Date:

03 May 2017, 17:26:27

On Wed, May 3, 2017 at 2:19 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> Or we will choose WITH MATERIALIZE, and then the users aware of the fencing
>> (and using the CTEs for that purpose) will have to modify the queries. But
>> does adding MATERIALIZE quality as major query rewrite?
>
> Hardly.
>
>> Perhaps combining this with a GUC would be a solution. I mean, a GUC
>> specifying the default behavior, and then INLINE / MATERIALIZE for
>> individual CTEs in a query?
>
> It'd be nice if we could do that for a couple of releases as an
> interim measure, but people will then get locked into relying on it,
> and we'll never be able to remove it.

The proposed guc seems like a good idea, without which ORMs that
support CTEs would be at a loss. People using those ORMs that need
materialized behavior would have to wait for the ORM to catch up with
postgres syntax before upgrading, and that wouldn't be a nice thing.

It's not about requiring testing before upgrading, of course users
should/will do that. But if said testing says inlined CTEs perform
horribly, and the ORM has no support for the materialized keyword, the
only option is to not upgrade. With the CTE, people can upgrade,
changing the default behavior back to what it was.

That seems to me a useful thing to have.

Re: [HACKERS] CTE inlining

From

David Fetter

Date:

03 May 2017, 17:31:58

On Wed, May 03, 2017 at 11:26:27AM -0300, Claudio Freire wrote:
> On Wed, May 3, 2017 at 2:19 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> >> Or we will choose WITH MATERIALIZE, and then the users aware of
> >> the fencing (and using the CTEs for that purpose) will have to
> >> modify the queries. But does adding MATERIALIZE quality as major
> >> query rewrite?
> >
> > Hardly.
> >
> >> Perhaps combining this with a GUC would be a solution. I mean, a
> >> GUC specifying the default behavior, and then INLINE /
> >> MATERIALIZE for individual CTEs in a query?
> >
> > It'd be nice if we could do that for a couple of releases as an
> > interim measure, but people will then get locked into relying on
> > it, and we'll never be able to remove it.
> 
> The proposed guc seems like a good idea, without which ORMs that
> support CTEs would be at a loss.

Are you aware of such an ORM which both supports WITH and doesn't also
closely track PostgreSQL development?  I'm not. 

Even assuming that such a thing exists, it's not at all obvious to me
that we should be stalling and/or putting in what will turn out to be
misfeatures to accommodate it.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: [HACKERS] CTE inlining

From

Thomas Kellerer

Date:

03 May 2017, 18:51:46

> I could tolerate telling people to use OFFSET 0 (and documenting it!)
> as a workaround if we can't get something more friendly in.

I agree with that. 

> If we go with WITH INLINE then we're likely not solving anything, because
> most people will simply use WITH just like now, and will be subject to the
> fencing without realizing it.

I agree - the default behaviour should be change to match what everybody
expects. The current behaviour should be the exception. 

> Yes, and we're missing the opportunity to confirm with what other
> systems do, and the spirit of the SQL language's declare what I want,
> not how to do it, model.

Essentially *all* other systems optimize CTEs the same way they optimize
derived tables. I think even MySQL does it like that in the upcoming 8.0
release. 

I have never heard anyone saying that the Postgres implementation is an
advantage and that they would hate to see this disappear. I usually hear
"Why is Postgres doing that? Can't they change that?" 

Maybe I have a limited view on this, but from where I stand, simply changing
it would help everybody I know and would not break anything. I don't even
think a replacement for the old behaviour would be necessary.





--
View this message in context: http://www.postgresql-archive.org/CTE-inlining-tp5958992p5959509.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] CTE inlining

From

Claudio Freire

Date:

03 May 2017, 19:27:38

On Wed, May 3, 2017 at 11:31 AM, David Fetter <david@fetter.org> wrote:
> On Wed, May 03, 2017 at 11:26:27AM -0300, Claudio Freire wrote:
>> On Wed, May 3, 2017 at 2:19 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> >> Or we will choose WITH MATERIALIZE, and then the users aware of
>> >> the fencing (and using the CTEs for that purpose) will have to
>> >> modify the queries. But does adding MATERIALIZE quality as major
>> >> query rewrite?
>> >
>> > Hardly.
>> >
>> >> Perhaps combining this with a GUC would be a solution. I mean, a
>> >> GUC specifying the default behavior, and then INLINE /
>> >> MATERIALIZE for individual CTEs in a query?
>> >
>> > It'd be nice if we could do that for a couple of releases as an
>> > interim measure, but people will then get locked into relying on
>> > it, and we'll never be able to remove it.
>>
>> The proposed guc seems like a good idea, without which ORMs that
>> support CTEs would be at a loss.
>
> Are you aware of such an ORM which both supports WITH and doesn't also
> closely track PostgreSQL development?  I'm not.
>
> Even assuming that such a thing exists, it's not at all obvious to me
> that we should be stalling and/or putting in what will turn out to be
> misfeatures to accommodate it.

I know SQLAlchemy does support CTEs, and lags quite considerably in
its support of the latest syntactic elements.

For instance, it took them 8 months to support the "skip locked" option.

Not sure whether that qualifies as "closely tracking" postgres for
you. Clearly they do track it, but that doesn't mean they're fast or
as fast as one would like/need.

Sure, that might not be enough to warrant the GUC. I would think so,
those are my 2 cents. YMMV.

Re: [HACKERS] CTE inlining

From

David Fetter

Date:

03 May 2017, 19:54:32

On Wed, May 03, 2017 at 01:27:38PM -0300, Claudio Freire wrote:
> On Wed, May 3, 2017 at 11:31 AM, David Fetter <david@fetter.org> wrote:
> > Are you aware of such an ORM which both supports WITH and doesn't
> > also closely track PostgreSQL development?  I'm not.
> >
> > Even assuming that such a thing exists, it's not at all obvious to
> > me that we should be stalling and/or putting in what will turn out
> > to be misfeatures to accommodate it.
> 
> I know SQLAlchemy does support CTEs, and lags quite considerably in
> its support of the latest syntactic elements.
> 
> For instance, it took them 8 months to support the "skip locked"
> option.

That is pretty strictly their problem.

> Not sure whether that qualifies as "closely tracking" postgres for
> you. Clearly they do track it, but that doesn't mean they're fast or
> as fast as one would like/need.

We can NOT make their tardiness a driver of our development.

> Sure, that might not be enough to warrant the GUC. I would think so,
> those are my 2 cents. YMMV.

When we add a "temporary" GUC, we're taking on a gigantic burden.
Either we support it forever somehow, or we put it on a deprecation
schedule immediately and expect to be answering questions about it for
years after it's been removed.

-1 for the GUC.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: [HACKERS] CTE inlining

From

Pavel Stehule

Date:

03 May 2017, 19:58:49

2017-05-03 18:54 GMT+02:00 David Fetter <david@fetter.org>:

On Wed, May 03, 2017 at 01:27:38PM -0300, Claudio Freire wrote:
> On Wed, May 3, 2017 at 11:31 AM, David Fetter <david@fetter.org> wrote:
> > Are you aware of such an ORM which both supports WITH and doesn't
> > also closely track PostgreSQL development? I'm not.
> >
> > Even assuming that such a thing exists, it's not at all obvious to
> > me that we should be stalling and/or putting in what will turn out
> > to be misfeatures to accommodate it.
>
> I know SQLAlchemy does support CTEs, and lags quite considerably in
> its support of the latest syntactic elements.
>
> For instance, it took them 8 months to support the "skip locked"
> option.

That is pretty strictly their problem.

> Not sure whether that qualifies as "closely tracking" postgres for
> you. Clearly they do track it, but that doesn't mean they're fast or
> as fast as one would like/need.

We can NOT make their tardiness a driver of our development.

> Sure, that might not be enough to warrant the GUC. I would think so,
> those are my 2 cents. YMMV.

When we add a "temporary" GUC, we're taking on a gigantic burden.
Either we support it forever somehow, or we put it on a deprecation
schedule immediately and expect to be answering questions about it for
years after it's been removed.

-1 for the GUC.

Is possible to find consensus without GUC? I understand well, why GUC is wrong, but I don't see any possible solution how to change current behave and don't break lot of applications.

Regards

Pavel

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE inlining

From

Alvaro Herrera

Date:

03 May 2017, 20:33:05

David Fetter wrote:

> When we add a "temporary" GUC, we're taking on a gigantic burden.
> Either we support it forever somehow, or we put it on a deprecation
> schedule immediately and expect to be answering questions about it for
> years after it's been removed.
> 
> -1 for the GUC.

Absolutely.

So ISTM we have three choices:

1) we switch unmarked CTEs as inlineable by default in pg11.  What seems
likely to happen for a user that upgrades to pg11 is that 5 out of 10
CTE-using queries are going to become faster than with pg10, and they
are going to be happy; 4 out of five are going to see no difference, but
they didn't have to do anything about it; and the remaining query is
going to become slower, either indistinguishably so (in which case they
don't care and they remain happy because of the other improvements) or
notably so, in which case they can easily figure where to add the
MATERIALIZED option and regain the original performance.

2) unmarked CTEs continue to be an optimization barrier, but we add
"WITH INLINED" so that they're inlineable.  Some users may wonder about
it and waste a lot of time trying to figure out which CTEs to add it to.
They see a benefit in half the queries, which makes them happy, but they
are angry that they had to waste all that time on the other queries.

3) We don't do anything, because we all agree that GUCs are not
suitable.  No progress.  No anger, but nobody is happy either.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CTE inlining

From

Pavel Stehule

Date:

03 May 2017, 20:44:17

2017-05-03 19:33 GMT+02:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

David Fetter wrote:

> When we add a "temporary" GUC, we're taking on a gigantic burden.
> Either we support it forever somehow, or we put it on a deprecation
> schedule immediately and expect to be answering questions about it for
> years after it's been removed.
>
> -1 for the GUC.

Absolutely.

So ISTM we have three choices:

1) we switch unmarked CTEs as inlineable by default in pg11. What seems
likely to happen for a user that upgrades to pg11 is that 5 out of 10
CTE-using queries are going to become faster than with pg10, and they
are going to be happy; 4 out of five are going to see no difference, but
they didn't have to do anything about it; and the remaining query is
going to become slower, either indistinguishably so (in which case they
don't care and they remain happy because of the other improvements) or
notably so, in which case they can easily figure where to add the
MATERIALIZED option and regain the original performance.

2) unmarked CTEs continue to be an optimization barrier, but we add
"WITH INLINED" so that they're inlineable. Some users may wonder about
it and waste a lot of time trying to figure out which CTEs to add it to.
They see a benefit in half the queries, which makes them happy, but they
are angry that they had to waste all that time on the other queries.

3) We don't do anything, because we all agree that GUCs are not
suitable. No progress. No anger, but nobody is happy either.

yes, these variants are all.

@2 is only little bit better than bad @3 - is not nice to have default behave different than any other and than what 90% developers expects.

Regards

Pavel

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE inlining

From

Kenneth Marshall

Date:

03 May 2017, 21:22:06

On Wed, May 03, 2017 at 02:33:05PM -0300, Alvaro Herrera wrote:
> David Fetter wrote:
> 
> > When we add a "temporary" GUC, we're taking on a gigantic burden.
> > Either we support it forever somehow, or we put it on a deprecation
> > schedule immediately and expect to be answering questions about it for
> > years after it's been removed.
> > 
> > -1 for the GUC.
> 
> Absolutely.
> 
> So ISTM we have three choices:
> 
> 1) we switch unmarked CTEs as inlineable by default in pg11.  What seems
> likely to happen for a user that upgrades to pg11 is that 5 out of 10
> CTE-using queries are going to become faster than with pg10, and they
> are going to be happy; 4 out of five are going to see no difference, but
> they didn't have to do anything about it; and the remaining query is
> going to become slower, either indistinguishably so (in which case they
> don't care and they remain happy because of the other improvements) or
> notably so, in which case they can easily figure where to add the
> MATERIALIZED option and regain the original performance.
> 
> 
> 2) unmarked CTEs continue to be an optimization barrier, but we add
> "WITH INLINED" so that they're inlineable.  Some users may wonder about
> it and waste a lot of time trying to figure out which CTEs to add it to.
> They see a benefit in half the queries, which makes them happy, but they
> are angry that they had to waste all that time on the other queries.
> 
> 
> 3) We don't do anything, because we all agree that GUCs are not
> suitable.  No progress.  No anger, but nobody is happy either.
> 

+1 for option 1. I just finished rewriting a well written CTE query to
avoid the optimization fence and get reasonable performance.

Regards,
Ken

Re: [HACKERS] CTE inlining

From

Andreas Karlsson

Date:

03 May 2017, 22:54:50

On 05/03/2017 07:33 PM, Alvaro Herrera wrote:
> 1) we switch unmarked CTEs as inlineable by default in pg11.  What seems
> likely to happen for a user that upgrades to pg11 is that 5 out of 10
> CTE-using queries are going to become faster than with pg10, and they
> are going to be happy; 4 out of five are going to see no difference, but
> they didn't have to do anything about it; and the remaining query is
> going to become slower, either indistinguishably so (in which case they
> don't care and they remain happy because of the other improvements) or
> notably so, in which case they can easily figure where to add the
> MATERIALIZED option and regain the original performance.
>
>
> 2) unmarked CTEs continue to be an optimization barrier, but we add
> "WITH INLINED" so that they're inlineable.  Some users may wonder about
> it and waste a lot of time trying to figure out which CTEs to add it to.
> They see a benefit in half the queries, which makes them happy, but they
> are angry that they had to waste all that time on the other queries.
>
>
> 3) We don't do anything, because we all agree that GUCs are not
> suitable.  No progress.  No anger, but nobody is happy either.

+1 for option 1. And while I would not like if we had to combine it with 
a backwards compatibility GUC which enables the old behavior to get it 
merged I still personally would prefer that over option 2 and 3.

Andreas

Re: [HACKERS] CTE inlining

From

Tomas Vondra

Date:

04 May 2017, 00:23:11


On 5/3/17 9:54 PM, Andreas Karlsson wrote:
> On 05/03/2017 07:33 PM, Alvaro Herrera wrote:
>> 1) we switch unmarked CTEs as inlineable by default in pg11.  What seems
>> likely to happen for a user that upgrades to pg11 is that 5 out of 10
>> CTE-using queries are going to become faster than with pg10, and they
>> are going to be happy; 4 out of five are going to see no difference, but
>> they didn't have to do anything about it; and the remaining query is
>> going to become slower, either indistinguishably so (in which case they
>> don't care and they remain happy because of the other improvements) or
>> notably so, in which case they can easily figure where to add the
>> MATERIALIZED option and regain the original performance.
>>
>>
>> 2) unmarked CTEs continue to be an optimization barrier, but we add
>> "WITH INLINED" so that they're inlineable.  Some users may wonder about
>> it and waste a lot of time trying to figure out which CTEs to add it to.
>> They see a benefit in half the queries, which makes them happy, but they
>> are angry that they had to waste all that time on the other queries.
>>
>>
>> 3) We don't do anything, because we all agree that GUCs are not
>> suitable.  No progress.  No anger, but nobody is happy either.
> 
> +1 for option 1. And while I would not like if we had to combine it with 
> a backwards compatibility GUC which enables the old behavior to get it 
> merged I still personally would prefer that over option 2 and 3.
> > Andreas>

+1 to what Andreas says

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CTE inlining

From

Merlin Moncure

Date:

04 May 2017, 00:24:05

On Wed, May 3, 2017 at 12:33 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> David Fetter wrote:
>
>> When we add a "temporary" GUC, we're taking on a gigantic burden.
>> Either we support it forever somehow, or we put it on a deprecation
>> schedule immediately and expect to be answering questions about it for
>> years after it's been removed.
>>
>> -1 for the GUC.
>
> Absolutely.
>
> So ISTM we have three choices:
>
> 1) we switch unmarked CTEs as inlineable by default in pg11.  What seems
> likely to happen for a user that upgrades to pg11 is that 5 out of 10
> CTE-using queries are going to become faster than with pg10, and they
> are going to be happy; 4 out of five are going to see no difference, but
> they didn't have to do anything about it; and the remaining query is
> going to become slower, either indistinguishably so (in which case they
> don't care and they remain happy because of the other improvements) or
> notably so, in which case they can easily figure where to add the
> MATERIALIZED option and regain the original performance.

+1 for option 1.  This change will be welcome for a large number of
queries, but forced materialization is a real need and I use it often.
This comes off as a very reasonable compromise in my opinion unless it
requires major coding gymnastics to implement.

merlin

Re: [HACKERS] CTE inlining

From

Gavin Flower

Date:

04 May 2017, 00:33:54

On 04/05/17 05:33, Alvaro Herrera wrote:
> David Fetter wrote:
>
>> When we add a "temporary" GUC, we're taking on a gigantic burden.
>> Either we support it forever somehow, or we put it on a deprecation
>> schedule immediately and expect to be answering questions about it for
>> years after it's been removed.
>>
>> -1 for the GUC.
> Absolutely.
>
> So ISTM we have three choices:
>
> 1) we switch unmarked CTEs as inlineable by default in pg11.  What seems
> likely to happen for a user that upgrades to pg11 is that 5 out of 10
> CTE-using queries are going to become faster than with pg10, and they
> are going to be happy; 4 out of five are going to see no difference, but
> they didn't have to do anything about it; and the remaining query is
> going to become slower, either indistinguishably so (in which case they
> don't care and they remain happy because of the other improvements) or
> notably so, in which case they can easily figure where to add the
> MATERIALIZED option and regain the original performance.
+1

I've watched a colleague spend hours trying to optimise a complicated 
query with nested views, then find that this 'optimisation fence' was 
the heart of the problem.


Cheers,
Gavin

Re: [HACKERS] CTE inlining

From

Andrew Dunstan

Date:

04 May 2017, 00:34:47


On 05/03/2017 05:24 PM, Merlin Moncure wrote:
> On Wed, May 3, 2017 at 12:33 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> David Fetter wrote:
>>
>>> When we add a "temporary" GUC, we're taking on a gigantic burden.
>>> Either we support it forever somehow, or we put it on a deprecation
>>> schedule immediately and expect to be answering questions about it for
>>> years after it's been removed.
>>>
>>> -1 for the GUC.
>> Absolutely.
>>
>> So ISTM we have three choices:
>>
>> 1) we switch unmarked CTEs as inlineable by default in pg11.  What seems
>> likely to happen for a user that upgrades to pg11 is that 5 out of 10
>> CTE-using queries are going to become faster than with pg10, and they
>> are going to be happy; 4 out of five are going to see no difference, but
>> they didn't have to do anything about it; and the remaining query is
>> going to become slower, either indistinguishably so (in which case they
>> don't care and they remain happy because of the other improvements) or
>> notably so, in which case they can easily figure where to add the
>> MATERIALIZED option and regain the original performance.
> +1 for option 1.  This change will be welcome for a large number of
> queries, but forced materialization is a real need and I use it often.
> This comes off as a very reasonable compromise in my opinion unless it
> requires major coding gymnastics to implement.
>


The only thing I am totally dead set against is making people go back to
using OFFSET 0. It's ugly and completely non-intuitive.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CTE inlining

From

Petr Jelinek

Date:

04 May 2017, 09:34:16

On 03/05/17 23:24, Merlin Moncure wrote:
> On Wed, May 3, 2017 at 12:33 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> David Fetter wrote:
>>
>>> When we add a "temporary" GUC, we're taking on a gigantic burden.
>>> Either we support it forever somehow, or we put it on a deprecation
>>> schedule immediately and expect to be answering questions about it for
>>> years after it's been removed.
>>>
>>> -1 for the GUC.
>>
>> Absolutely.
>>
>> So ISTM we have three choices:
>>
>> 1) we switch unmarked CTEs as inlineable by default in pg11.  What seems
>> likely to happen for a user that upgrades to pg11 is that 5 out of 10
>> CTE-using queries are going to become faster than with pg10, and they
>> are going to be happy; 4 out of five are going to see no difference, but
>> they didn't have to do anything about it; and the remaining query is
>> going to become slower, either indistinguishably so (in which case they
>> don't care and they remain happy because of the other improvements) or
>> notably so, in which case they can easily figure where to add the
>> MATERIALIZED option and regain the original performance.
> 
> +1 for option 1.  This change will be welcome for a large number of
> queries, but forced materialization is a real need and I use it often.
> This comes off as a very reasonable compromise in my opinion unless it
> requires major coding gymnastics to implement.
> 

+1 to this

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: [HACKERS] CTE inlining

From

Thomas Kellerer

Date:

04 May 2017, 10:25:31

> 1) we switch unmarked CTEs as inlineable by default in pg11. 

+1 from me for option 1




--
View this message in context: http://www.postgresql-archive.org/CTE-inlining-tp5958992p5959615.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] CTE inlining

From

Gavin Flower

Date:

04 May 2017, 12:37:36

On 04/05/17 05:33, Alvaro Herrera wrote:> David Fetter wrote:>>> When we add a "temporary" GUC, we're taking on a
giganticburden.>> Either we support it forever somehow, or we put it on a deprecation>> schedule immediately and expect
tobe answering questions about it for>> years after it's been removed.>>>> -1 for the GUC.>> Absolutely.>> So ISTM we
havethree choices:>> 1) we switch unmarked CTEs as inlineable by default in pg11. What seems> likely to happen for a
userthat upgrades to pg11 is that 5 out of 10> CTE-using queries are going to become faster than with pg10, and they>
aregoing to be happy; 4 out of five are going to see no difference, but> they didn't have to do anything about it; and
theremaining query is> going to become slower, either indistinguishably so (in which case they> don't care and they
remainhappy because of the other improvements) or> notably so, in which case they can easily figure where to add the>
MATERIALIZEDoption and regain the original performance.
 

+1 (Mario, proxied by me)

I've been asked to pass on comments by my colleague Mario of True Group
regarding the pain of the current behaviour of CTE's being optimisation
fences (as he doesn't normally follow this email group, I'll pass on any
feedback):
    "Greetings,
    Jumping onto the bandwagon here. At True Group in Auckland, we use    PostgreSQL as the platform for large-scale
softwaredevelopment, often    with extensive and complex schemas, views and queries.
 
    We frequently encounter poor query performance due to CTEs being an    optimisation barrier. We are forced to use
workaroundssuch as a    set-returning function (parameterised view) in place of a view, and    manually placing the
neededquals (WHERE clauses) into each CTE. This    creates headaches in large systems, where writing expressive SQL is
 essential, rather than employing workarounds that make code harder to    understand and reuse.
 
    Our general assumption is that we should write SQL that describes what    we want to achieve. The planner's job is
todetermine how to do that    efficiently.
 
    There is an argument that pushing quals into CTEs could reduce    performance for some queries, especially if the
qualis expensive. My    answer is twofold:
 
    a) most queries will either experience no change or benefit from qual    push-down, as usually the expensive part
isthe query subtree, not the    top-level qual.
 
    b) if a small proportion of queries are negatively affected, this is    better addressed by improving the planner's
costestimation. At worst,    an explicit OPTIMIZATION BARRIER hint could be added. But I know there    is much
philosophicalobjection in the PostgreSQL community to planner    hints. The irony is that a main argument for retaining
currentCTE    behaviour is that people rely on CTEs as implicit barrier hints!
 
    As an aside, we also encounter several other instances where qual    push-down fails, including where rewording a
queryin a way that is    semantically identical can change whether push-down takes place. But    probably the greatest
bugbearis inability to push-down a qual into 
 
more    than one subtree, meaning if a query has two or more subtrees, each of    which is expensive, but only one row
isneeded from each, determined by    one qual at the top level, one subtree will get the qual pushed into it    and run
fast,while the others will do a full (expensive) table 
 
scan and    be filtered only afterwards. There are also gains to be had in 
improving    qual push-down in sub-selects with aggregates, and others. But I think    these issues are only due to
lackof resource to implement, rather than    philosophical/political objections.
 
    We would like to see a Postgres version in the future that does much    better planning in the areas I've
mentioned,including but not limited    to the present issue of CTEs. Our organisation may be able to 
 
contribute    materially to this, if the political will is there, and others affected    pitch in to achieve common
goals.Not being expert in Postgres    internals, I am not certain how difficult each of the problems is.
 
    I hope we have contributed usefully to the conversation, and invite    feedback.

    Sincerely,    Mario"


Cheers,
Gavin

Re: [HACKERS] CTE inlining

From

Gavin Flower

Date:

04 May 2017, 13:02:06

On 30/04/17 16:28, Tom Lane wrote:
> Craig Ringer <craig.ringer@2ndquadrant.com> writes:
>> - as you noted, it is hard to decide when it's worth inlining vs
>> materializing for CTE terms referenced more than once.
> [ raised eyebrow... ]  Please explain why the answer isn't trivially
> "never".
>
> There's already a pretty large hill to climb here in the way of
> breaking peoples' expectations about CTEs being optimization
> fences.  Breaking the documented semantics about CTEs being
> single-evaluation seems to me to be an absolute non-starter.
>
>             regards, tom lane
>
>
Could not each CTE be only evaluated once, but restricted (as far as is 
practicable) to the rows actually needed by the body of the SELECT?


Cheers,
Gavin

Re: [HACKERS] CTE inlining

From

Julien Rouhaud

Date:

04 May 2017, 14:45:01

On 04/05/2017 08:34, Petr Jelinek wrote:
> On 03/05/17 23:24, Merlin Moncure wrote:
>> On Wed, May 3, 2017 at 12:33 PM, Alvaro Herrera
>> <alvherre@2ndquadrant.com> wrote:
>>> David Fetter wrote:
>>>
>>>> When we add a "temporary" GUC, we're taking on a gigantic burden.
>>>> Either we support it forever somehow, or we put it on a deprecation
>>>> schedule immediately and expect to be answering questions about it for
>>>> years after it's been removed.
>>>>
>>>> -1 for the GUC.
>>>
>>> Absolutely.
>>>
>>> So ISTM we have three choices:
>>>
>>> 1) we switch unmarked CTEs as inlineable by default in pg11.  What seems
>>> likely to happen for a user that upgrades to pg11 is that 5 out of 10
>>> CTE-using queries are going to become faster than with pg10, and they
>>> are going to be happy; 4 out of five are going to see no difference, but
>>> they didn't have to do anything about it; and the remaining query is
>>> going to become slower, either indistinguishably so (in which case they
>>> don't care and they remain happy because of the other improvements) or
>>> notably so, in which case they can easily figure where to add the
>>> MATERIALIZED option and regain the original performance.
>>
>> +1 for option 1.  This change will be welcome for a large number of
>> queries, but forced materialization is a real need and I use it often.
>> This comes off as a very reasonable compromise in my opinion unless it
>> requires major coding gymnastics to implement.
>>
> 
> +1 to this
> 

+1 too

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Re: [HACKERS] CTE inlining

From

Serge Rielau

Date:

04 May 2017, 18:04:09

> On May 4, 2017, at 3:02 AM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:
>
> On 30/04/17 16:28, Tom Lane wrote:
>> Craig Ringer <craig.ringer@2ndquadrant.com> writes:
>>> - as you noted, it is hard to decide when it's worth inlining vs
>>> materializing for CTE terms referenced more than once.
>> [ raised eyebrow... ]  Please explain why the answer isn't trivially
>> "never".
>>
>> There's already a pretty large hill to climb here in the way of
>> breaking peoples' expectations about CTEs being optimization
>> fences.  Breaking the documented semantics about CTEs being
>> single-evaluation seems to me to be an absolute non-starter.
>>
>>             regards, tom lane
>>
>>
> Could not each CTE be only evaluated once, but restricted (as far as is practicable) to the rows actually needed by
thebody of the SELECT? 
>
Tom,

Are you worried about semantics or performance?
With proper detection of mutating functions and snapshot isolation I do not see how a user would detect “lack of”
singleevaluation. 
As for performance we’d be talking about what? An uncorrelated inner of a nested loop join?

Anyway it seems to me that there a multiple properties at play here which are quite orthogonal.

1. Full materialization/Slow materialization/pipelining I cannot come up with any scenario where full materialization
wouldbe beneficial from a performance point of view (which speaks to Gavin’s view). I can see it from a semantic point
ofview when order of execution may matter (for example with embedded DML and triggers present). As soon as semantics
areat play having syntax is absolutely the right thing: +1 for MATERIALIZE 
2.Pushing predicates (or other operators) into the CTE.  All this can ever do is reduce the number of rows being looked
at. As long as the optimizer is careful, not to do what it isn’t supposed to do in a nested query (push past a mutating
function)I don’t see the harm 
3. Replicating the CTE to push distinct operators from different consumers.  Again this can only be done if there are
nomutators or non deterministic operators. 

To summarize: big +1 to preserve the existing behavior with MATERIALIZE and to set CTE’s free by default with the onus
onthe optimizer to prove semantic equivalence. 

Cheers
Serge Rielau
Salesforce.com

Re: [HACKERS] CTE inlining

From

Tom Lane

Date:

04 May 2017, 18:36:50

Serge Rielau <serge@rielau.com> writes:
>> On May 4, 2017, at 3:02 AM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:
>> On 30/04/17 16:28, Tom Lane wrote:
>>> There's already a pretty large hill to climb here in the way of
>>> breaking peoples' expectations about CTEs being optimization
>>> fences.  Breaking the documented semantics about CTEs being
>>> single-evaluation seems to me to be an absolute non-starter.

> Are you worried about semantics or performance?
> With proper detection of mutating functions and snapshot isolation I do not see how a user would detect “lack of”
singleevaluation. 

I do not think a user will have any trouble "detecting" what we did
if his query runs for two hours, rather than two minutes, because we
executed some expensive function 100 times rather than the one time
he expected it to run.

Now you could argue that that's user error because he should have
marked the expensive function with a sufficiently high cost to
discourage us from flattening the CTE.  But not everyone will have
done that (and I'm not sure we have any planner smarts that would
respond to such cases anyway).  So what I'm saying is that if you're
promising there will be no visible bad consequences, you're wrong.

It may be worth breaking some peoples' queries to make other peoples'
queries faster, but I think we need to tread pretty carefully there.
And we definitely can't change any case where there would be visible
semantic differences.
        regards, tom lane

Re: [HACKERS] CTE inlining

From

Andrew Dunstan

Date:

04 May 2017, 19:22:36


On 05/04/2017 11:36 AM, Tom Lane wrote:
> Serge Rielau <serge@rielau.com> writes:
>>> On May 4, 2017, at 3:02 AM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:
>>> On 30/04/17 16:28, Tom Lane wrote:
>>>> There's already a pretty large hill to climb here in the way of
>>>> breaking peoples' expectations about CTEs being optimization
>>>> fences.  Breaking the documented semantics about CTEs being
>>>> single-evaluation seems to me to be an absolute non-starter.
>> Are you worried about semantics or performance?
>> With proper detection of mutating functions and snapshot isolation I do not see how a user would detect “lack of”
singleevaluation.
 
> I do not think a user will have any trouble "detecting" what we did
> if his query runs for two hours, rather than two minutes, because we
> executed some expensive function 100 times rather than the one time
> he expected it to run.
>
> Now you could argue that that's user error because he should have
> marked the expensive function with a sufficiently high cost to
> discourage us from flattening the CTE.  But not everyone will have
> done that (and I'm not sure we have any planner smarts that would
> respond to such cases anyway).  So what I'm saying is that if you're
> promising there will be no visible bad consequences, you're wrong.
>
> It may be worth breaking some peoples' queries to make other peoples'
> queries faster, but I think we need to tread pretty carefully there.
> And we definitely can't change any case where there would be visible
> semantic differences.
>

Yeah, the idea that this won't cause possibly significant pain is quite wrong. Quite by accident I came across an
examplejust this morning where rewriting as a CTE makes a big improvement.
 

I wrote this query:
   select (json_populate_record(null::mytype, myjson)).*   from mytable;


It turned out that this was an order of magnitude faster:
   with r as   (      select json_populate_record(null::mytype, myjson) as x      from mytable   )   select (x).*
fromr;
 


cheers

andrew



-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CTE inlining

From

"David G. Johnston"

Date:

04 May 2017, 19:34:19

On Thu, May 4, 2017 at 9:22 AM, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:

Yeah, the idea that this won't cause possibly significant pain is quite wrong. Quite by accident I came across an example just this morning where rewriting as a CTE makes a big improvement.

I wrote this query:

select (json_populate_record(null::mytype, myjson)).*
from mytable;

It turned out that this was an order of magnitude faster:

with r as
(
select json_populate_record(null::mytype, myjson) as x
from mytable
)
select (x).*
from r;

Except I suspect we at least have a chance to detect the above and not de-optimize it by evaluating "json_populate_record" once for every column in mytype.

The now idiomatic solution to the above is to use LATERAL so the above CTE is no longer actually a required workaround.

David J.

Re: [HACKERS] CTE inlining

From

Andres Freund

Date:

04 May 2017, 19:39:11

On 2017-05-04 09:34:19 -0700, David G. Johnston wrote:
> On Thu, May 4, 2017 at 9:22 AM, Andrew Dunstan <
> andrew.dunstan@2ndquadrant.com> wrote:
> 
> >
> > Yeah, the idea that this won't cause possibly significant pain is quite
> > wrong. Quite by accident I came across an example just this morning where
> > rewriting as a CTE makes a big improvement.
> >
> > I wrote this query:
> >
> >     select (json_populate_record(null::mytype, myjson)).*
> >     from mytable;
> >
> >
> > It turned out that this was an order of magnitude faster:
> >
> >     with r as
> >     (
> >        select json_populate_record(null::mytype, myjson) as x
> >        from mytable
> >     )
> >     select (x).*
> >     from r;

That's a very specific issue, that you can just as well solve with CTEs
though.  IIRC Tom and I were discussing changing this - independently of
anything CTE releated - so it doesn't end up calling the function
multiple times. We e.g. could now quite easily put the
json_populate_record(null::mytype, myjson) into something *like* a
nodeProjectSet node, and handle the column accesses via the nromal
projection.

> Except I suspect we at least have a chance to detect the above and not
> de-optimize it by evaluating "json_populate_record" once for every column
> in mytype.

That's what happens with the non CTE version - don't think it's super
likely we'd end up doing that for CTEs; I'd even consider it a bug.

Greetings,

Andres Freund

Re: [HACKERS] CTE inlining

From

Andrew Dunstan

Date:

04 May 2017, 20:20:20


On 05/04/2017 12:34 PM, David G. Johnston wrote:
> On Thu, May 4, 2017 at 9:22 AM, Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com
> <mailto:andrew.dunstan@2ndquadrant.com>>wrote:
>
>
>     Yeah, the idea that this won't cause possibly significant pain is
>     quite wrong. Quite by accident I came across an example just this
>     morning where rewriting as a CTE makes a big improvement.
>
>     I wrote this query:
>
>         select (json_populate_record(null::mytype, myjson)).*
>         from mytable;
>
>
>     It turned out that this was an order of magnitude faster:
>
>         with r as
>         (
>            select json_populate_record(null::mytype, myjson) as x
>            from mytable
>         )
>         select (x).*
>         from r;
>
>
> Except I suspect we at least have a chance to detect the above and
> not de-optimize it by evaluating "json_populate_record" once for every
> column in mytype.
>
> The now idiomatic solution to the above is to use LATERAL so the
> above CTE is no longer actually a required workaround.


Hadn't though about LATERAL, good point. Still, there will be other cases.

cheers

andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CTE inlining

From

Alvaro Herrera

Date:

04 May 2017, 20:33:39

Andrew Dunstan wrote:

> Hadn't though about LATERAL, good point. Still, there will be other cases.

I'm not sure what your point is.  We know that for some cases the
optimization barrier semantics are useful, which is why the proposal is
to add a keyword to install one explicitely:
        with materialized r as        (           select json_populate_record(null::mytype, myjson) as x           from
mytable       )        select (x).*        from r;
 

this would preserve the current semantics.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CTE inlining

From

Andrew Dunstan

Date:

04 May 2017, 20:44:12


On 05/04/2017 01:33 PM, Alvaro Herrera wrote:
> Andrew Dunstan wrote:
>
>> Hadn't though about LATERAL, good point. Still, there will be other cases.
> I'm not sure what your point is.  We know that for some cases the
> optimization barrier semantics are useful, which is why the proposal is
> to add a keyword to install one explicitely:
>
>          with materialized r as
>          (
>             select json_populate_record(null::mytype, myjson) as x
>             from mytable
>          )
>          select (x).*
>          from r;
>
> this would preserve the current semantics.
>


I understand. Some people appear to think hardly anyone will actually
need this - that's what I was arguing against.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CTE inlining

From

Joe Conway

Date:

04 May 2017, 20:52:01

On 05/04/2017 10:33 AM, Alvaro Herrera wrote:
> I'm not sure what your point is.  We know that for some cases the
> optimization barrier semantics are useful, which is why the proposal is
> to add a keyword to install one explicitely:
>
>          with materialized r as
>          (
>             select json_populate_record(null::mytype, myjson) as x
>             from mytable
>          )
>          select (x).*
>          from r;
>
> this would preserve the current semantics.

I haven't been able to follow this incredibly long thread, so please
excuse me if way off base, but are we talking about that a CTE would be
silently be rewritten as an inline expression potentially unless it is
decorated with some new syntax?

I would find that very disconcerting myself. For example, would this CTE
potentially get rewritten with multiple evaluation as follows?

DROP SEQUENCE IF EXISTS foo_seq;
CREATE SEQUENCE foo_seq;

WITH a(f1) AS (SELECT nextval('foo_seq'))
SELECT a.f1, a.f1 FROM a;f1 | ?column?
----+---------- 1 |        1
(1 row)

ALTER SEQUENCE foo_seq RESTART;
SELECT nextval('foo_seq'), nextval('foo_seq');nextval | ?column?
---------+----------      1 |        2
(1 row)

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Re: [HACKERS] CTE inlining

From

Andrew Dunstan

Date:

04 May 2017, 20:56:03


On 05/04/2017 01:52 PM, Joe Conway wrote:
> On 05/04/2017 10:33 AM, Alvaro Herrera wrote:
>> I'm not sure what your point is.  We know that for some cases the
>> optimization barrier semantics are useful, which is why the proposal is
>> to add a keyword to install one explicitely:
>>
>>          with materialized r as
>>          (
>>             select json_populate_record(null::mytype, myjson) as x
>>             from mytable
>>          )
>>          select (x).*
>>          from r;
>>
>> this would preserve the current semantics.
> I haven't been able to follow this incredibly long thread, so please
> excuse me if way off base, but are we talking about that a CTE would be
> silently be rewritten as an inline expression potentially unless it is
> decorated with some new syntax?
>
> I would find that very disconcerting myself. For example, would this CTE
> potentially get rewritten with multiple evaluation as follows?
>
> DROP SEQUENCE IF EXISTS foo_seq;
> CREATE SEQUENCE foo_seq;
>
> WITH a(f1) AS (SELECT nextval('foo_seq'))
> SELECT a.f1, a.f1 FROM a;
>  f1 | ?column?
> ----+----------
>   1 |        1
> (1 row)
>
> ALTER SEQUENCE foo_seq RESTART;
> SELECT nextval('foo_seq'), nextval('foo_seq');
>  nextval | ?column?
> ---------+----------
>        1 |        2
> (1 row)
>



I think that would be a change in semantics, which we should definitely
not be getting. Avoiding a change in semantics might be an interesting
exercise, but we have lots of clever coders ...

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CTE inlining

From

Tomas Vondra

Date:

04 May 2017, 21:00:11

On 5/4/17 7:56 PM, Andrew Dunstan wrote:
> 
> 
> On 05/04/2017 01:52 PM, Joe Conway wrote:
>> On 05/04/2017 10:33 AM, Alvaro Herrera wrote:
>>> I'm not sure what your point is.  We know that for some cases the
>>> optimization barrier semantics are useful, which is why the proposal is
>>> to add a keyword to install one explicitely:
>>>
>>>           with materialized r as
>>>           (
>>>              select json_populate_record(null::mytype, myjson) as x
>>>              from mytable
>>>           )
>>>           select (x).*
>>>           from r;
>>>
>>> this would preserve the current semantics.
>> I haven't been able to follow this incredibly long thread, so please
>> excuse me if way off base, but are we talking about that a CTE would be
>> silently be rewritten as an inline expression potentially unless it is
>> decorated with some new syntax?
>>
>> I would find that very disconcerting myself. For example, would this CTE
>> potentially get rewritten with multiple evaluation as follows?
>>
>> DROP SEQUENCE IF EXISTS foo_seq;
>> CREATE SEQUENCE foo_seq;
>>
>> WITH a(f1) AS (SELECT nextval('foo_seq'))
>> SELECT a.f1, a.f1 FROM a;
>>   f1 | ?column?
>> ----+----------
>>    1 |        1
>> (1 row)
>>
>> ALTER SEQUENCE foo_seq RESTART;
>> SELECT nextval('foo_seq'), nextval('foo_seq');
>>   nextval | ?column?
>> ---------+----------
>>         1 |        2
>> (1 row)
>>
> 
> 
> 
> I think that would be a change in semantics, which we should definitely
> not be getting. Avoiding a change in semantics might be an interesting
> exercise, but we have lots of clever coders ...
> 

nextval is a volatile function, and in my understanding the plan was not 
to inline CTEs with volatile functions (or CTEs doing writes etc.). That 
is, we'd guarantee the same results as we get now.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CTE inlining

From

Joe Conway

Date:

04 May 2017, 21:03:13

On 05/04/2017 10:56 AM, Andrew Dunstan wrote:
>
>
> On 05/04/2017 01:52 PM, Joe Conway wrote:
>> On 05/04/2017 10:33 AM, Alvaro Herrera wrote:
>>> I'm not sure what your point is.  We know that for some cases the
>>> optimization barrier semantics are useful, which is why the proposal is
>>> to add a keyword to install one explicitely:
>>>
>>>          with materialized r as
>>>          (
>>>             select json_populate_record(null::mytype, myjson) as x
>>>             from mytable
>>>          )
>>>          select (x).*
>>>          from r;
>>>
>>> this would preserve the current semantics.
>> I haven't been able to follow this incredibly long thread, so please
>> excuse me if way off base, but are we talking about that a CTE would be
>> silently be rewritten as an inline expression potentially unless it is
>> decorated with some new syntax?
>>
>> I would find that very disconcerting myself. For example, would this CTE
>> potentially get rewritten with multiple evaluation as follows?
>>
>> DROP SEQUENCE IF EXISTS foo_seq;
>> CREATE SEQUENCE foo_seq;
>>
>> WITH a(f1) AS (SELECT nextval('foo_seq'))
>> SELECT a.f1, a.f1 FROM a;
>>  f1 | ?column?
>> ----+----------
>>   1 |        1
>> (1 row)
>>
>> ALTER SEQUENCE foo_seq RESTART;
>> SELECT nextval('foo_seq'), nextval('foo_seq');
>>  nextval | ?column?
>> ---------+----------
>>        1 |        2
>> (1 row)
>>
>
> I think that would be a change in semantics, which we should definitely
> not be getting. Avoiding a change in semantics might be an interesting
> exercise, but we have lots of clever coders ...

Well I think my point is that I always have understood CTEs to be
executed precisely once producing a temporary result set that is then
referenced elsewhere. I don't think that property of CTEs should change.
Somewhere else in the thread someone mentioned predicate push down --
that makes sense and maybe some clever coder can come up with a patch
that does that, but I would not be in favor of CTEs being inlined and
therefore evaluated multiple times.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Re: [HACKERS] CTE inlining

From

Tom Lane

Date:

04 May 2017, 21:37:46

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I'm not sure what your point is.  We know that for some cases the
> optimization barrier semantics are useful, which is why the proposal is
> to add a keyword to install one explicitely:

>          with materialized r as
>          (
>             select json_populate_record(null::mytype, myjson) as x
>             from mytable
>          )
>          select (x).*
>          from r;

> this would preserve the current semantics.

Uh, no, it wouldn't, until you had changed your query (and made it
non-SQL-compliant, and unable to work at all on pre-v11 PG).
This might be a good design, but surely it does not provide backwards
compatibility.
        regards, tom lane

Re: [HACKERS] CTE inlining

From

Tomas Vondra

Date:

04 May 2017, 21:39:31


On 5/4/17 8:03 PM, Joe Conway wrote:
> On 05/04/2017 10:56 AM, Andrew Dunstan wrote:
>>
>>
>> On 05/04/2017 01:52 PM, Joe Conway wrote:
>>> On 05/04/2017 10:33 AM, Alvaro Herrera wrote:
>>>> I'm not sure what your point is.  We know that for some cases the
>>>> optimization barrier semantics are useful, which is why the proposal is
>>>> to add a keyword to install one explicitely:
>>>>
>>>>           with materialized r as
>>>>           (
>>>>              select json_populate_record(null::mytype, myjson) as x
>>>>              from mytable
>>>>           )
>>>>           select (x).*
>>>>           from r;
>>>>
>>>> this would preserve the current semantics.
>>> I haven't been able to follow this incredibly long thread, so please
>>> excuse me if way off base, but are we talking about that a CTE would be
>>> silently be rewritten as an inline expression potentially unless it is
>>> decorated with some new syntax?
>>>
>>> I would find that very disconcerting myself. For example, would this CTE
>>> potentially get rewritten with multiple evaluation as follows?
>>>
>>> DROP SEQUENCE IF EXISTS foo_seq;
>>> CREATE SEQUENCE foo_seq;
>>>
>>> WITH a(f1) AS (SELECT nextval('foo_seq'))
>>> SELECT a.f1, a.f1 FROM a;
>>>   f1 | ?column?
>>> ----+----------
>>>    1 |        1
>>> (1 row)
>>>
>>> ALTER SEQUENCE foo_seq RESTART;
>>> SELECT nextval('foo_seq'), nextval('foo_seq');
>>>   nextval | ?column?
>>> ---------+----------
>>>         1 |        2
>>> (1 row)
>>>
>>
>> I think that would be a change in semantics, which we should definitely
>> not be getting. Avoiding a change in semantics might be an interesting
>> exercise, but we have lots of clever coders ...
> 
> Well I think my point is that I always have understood CTEs to be
> executed precisely once producing a temporary result set that is then
> referenced elsewhere. I don't think that property of CTEs should change.
> Somewhere else in the thread someone mentioned predicate push down --
> that makes sense and maybe some clever coder can come up with a patch
> that does that, but I would not be in favor of CTEs being inlined and
> therefore evaluated multiple times.
> 

I agree with this, but there's a difference between "executed exactly 
once" and "producing the same result as if executed exactly once".

I may be misunderstanding what other people proposed in this thread, but 
I think the plan was to only inline CTEs where we know it won't change 
the results, etc. So e.g. CTEs with volatile functions would not get 
inlined, which includes nextval() for example.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CTE inlining

From

Tom Lane

Date:

04 May 2017, 21:52:34

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 5/4/17 8:03 PM, Joe Conway wrote:
>>> I haven't been able to follow this incredibly long thread, so please
>>> excuse me if way off base, but are we talking about that a CTE would be
>>> silently be rewritten as an inline expression potentially unless it is
>>> decorated with some new syntax?

> I agree with this, but there's a difference between "executed exactly 
> once" and "producing the same result as if executed exactly once".

> I may be misunderstanding what other people proposed in this thread, but 
> I think the plan was to only inline CTEs where we know it won't change 
> the results, etc. So e.g. CTEs with volatile functions would not get 
> inlined, which includes nextval() for example.

I haven't been keeping close tabs either, but surely we still have to have
the optimization fence in (at least) all these cases:

* CTE contains INSERT/UPDATE/DELETE
* CTE contains SELECT FOR UPDATE/SHARE (else the set of rows that get locked might change)
* CTE contains volatile functions

I'm willing to write off cases where, eg, a function should have been
marked volatile and was not.  That's user error and there are plenty
of hazards of that kind already.  But if the optimizer has reason
to know that discarding the fence might change any query side-effects,
it mustn't.
        regards, tom lane

Re: [HACKERS] CTE inlining

From

Serge Rielau

Date:

04 May 2017, 21:59:32

I haven't been keeping close tabs either, but surely we still have to have
the optimization fence in (at least) all these cases:

* CTE contains INSERT/UPDATE/DELETE
* CTE contains SELECT FOR UPDATE/SHARE (else the set of rows that get
locked might change)
* CTE contains volatile functions

I'm willing to write off cases where, eg, a function should have been
marked volatile and was not. That's user error and there are plenty
of hazards of that kind already. But if the optimizer has reason
to know that discarding the fence might change any query side-effects,
it mustn't.

Yes! +100

Cheers

Serge

Re: [HACKERS] CTE inlining

From

Gavin Flower

Date:

05 May 2017, 00:24:10

On 05/05/17 06:39, Tomas Vondra wrote:
>
>
> On 5/4/17 8:03 PM, Joe Conway wrote:
>> On 05/04/2017 10:56 AM, Andrew Dunstan wrote:
>>>
>>>
>>> On 05/04/2017 01:52 PM, Joe Conway wrote:
>>>> On 05/04/2017 10:33 AM, Alvaro Herrera wrote:
>>>>> I'm not sure what your point is.  We know that for some cases the
>>>>> optimization barrier semantics are useful, which is why the 
>>>>> proposal is
>>>>> to add a keyword to install one explicitely:
>>>>>
>>>>>           with materialized r as
>>>>>           (
>>>>>              select json_populate_record(null::mytype, myjson) as x
>>>>>              from mytable
>>>>>           )
>>>>>           select (x).*
>>>>>           from r;
>>>>>
>>>>> this would preserve the current semantics.
>>>> I haven't been able to follow this incredibly long thread, so please
>>>> excuse me if way off base, but are we talking about that a CTE 
>>>> would be
>>>> silently be rewritten as an inline expression potentially unless it is
>>>> decorated with some new syntax?
>>>>
>>>> I would find that very disconcerting myself. For example, would 
>>>> this CTE
>>>> potentially get rewritten with multiple evaluation as follows?
>>>>
>>>> DROP SEQUENCE IF EXISTS foo_seq;
>>>> CREATE SEQUENCE foo_seq;
>>>>
>>>> WITH a(f1) AS (SELECT nextval('foo_seq'))
>>>> SELECT a.f1, a.f1 FROM a;
>>>>   f1 | ?column?
>>>> ----+----------
>>>>    1 |        1
>>>> (1 row)
>>>>
>>>> ALTER SEQUENCE foo_seq RESTART;
>>>> SELECT nextval('foo_seq'), nextval('foo_seq');
>>>>   nextval | ?column?
>>>> ---------+----------
>>>>         1 |        2
>>>> (1 row)
>>>>
>>>
>>> I think that would be a change in semantics, which we should definitely
>>> not be getting. Avoiding a change in semantics might be an interesting
>>> exercise, but we have lots of clever coders ...
>>
>> Well I think my point is that I always have understood CTEs to be
>> executed precisely once producing a temporary result set that is then
>> referenced elsewhere. I don't think that property of CTEs should change.
>> Somewhere else in the thread someone mentioned predicate push down --
>> that makes sense and maybe some clever coder can come up with a patch
>> that does that, but I would not be in favor of CTEs being inlined and
>> therefore evaluated multiple times.
>>
>
> I agree with this, but there's a difference between "executed exactly 
> once" and "producing the same result as if executed exactly once".
>
> I may be misunderstanding what other people proposed in this thread, 
> but I think the plan was to only inline CTEs where we know it won't 
> change the results, etc. So e.g. CTEs with volatile functions would 
> not get inlined, which includes nextval() for example.
>
> regards
>
It was the behaviour of "producing the same result as if executed 
exactly once" that I was thinking of - I think this is still valid for 
triggers & volatile functions, but such behaviour should be clearly 
documented.  This what I implicitly thought about CTE's when I first 
came across them - to me it is the intuitively obvious behaviour.  
However, limiting the rows based on the body of the SELECT would often 
be a very useful optimisation


Cheers,
Gavin

Re: [HACKERS] CTE inlining

From

Andreas Karlsson

Date:

05 May 2017, 01:04:46

On 05/04/2017 06:22 PM, Andrew Dunstan wrote:
> I wrote this query:
>
>     select (json_populate_record(null::mytype, myjson)).*
>     from mytable;
>
>
> It turned out that this was an order of magnitude faster:
>
>     with r as
>     (
>        select json_populate_record(null::mytype, myjson) as x
>        from mytable
>     )
>     select (x).*
>     from r;

I do not know the planner that well, but I imagined that when we remove 
the optimization fence that one would be evaluated similar to if it had 
been a lateral join, i.e. there would be no extra function calls in this 
case after removing the fence.

Andreas

Re: [HACKERS] CTE inlining

From

Craig Ringer

Date:

05 May 2017, 03:03:01

On 5 May 2017 02:52, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 5/4/17 8:03 PM, Joe Conway wrote:
>>> I haven't been able to follow this incredibly long thread, so please
>>> excuse me if way off base, but are we talking about that a CTE would be
>>> silently be rewritten as an inline expression potentially unless it is
>>> decorated with some new syntax?

> I agree with this, but there's a difference between "executed exactly
> once" and "producing the same result as if executed exactly once".

> I may be misunderstanding what other people proposed in this thread, but
> I think the plan was to only inline CTEs where we know it won't change
> the results, etc. So e.g. CTEs with volatile functions would not get
> inlined, which includes nextval() for example.

I haven't been keeping close tabs either, but surely we still have to have
the optimization fence in (at least) all these cases:

* CTE contains INSERT/UPDATE/DELETE
* CTE contains SELECT FOR UPDATE/SHARE (else the set of rows that get
locked might change)
* CTE contains volatile functions

I'm willing to write off cases where, eg, a function should have been
marked volatile and was not. That's user error and there are plenty
of hazards of that kind already. But if the optimizer has reason
to know that discarding the fence might change any query side-effects,
it mustn't.

I think everyone is in total agreement there.

Re: [HACKERS] CTE inlining

From

Craig Ringer

Date:

05 May 2017, 03:16:42

On 5 May 2017 06:04, "Andreas Karlsson" <andreas@proxel.se> wrote:

On 05/04/2017 06:22 PM, Andrew Dunstan wrote:
I wrote this query:

select (json_populate_record(null::mytype, myjson)).*
from mytable;

It turned out that this was an order of magnitude faster:

with r as
(
select json_populate_record(null::mytype, myjson) as x
from mytable
)
select (x).*
from r;

I do not know the planner that well, but I imagined that when we remove the optimization fence that one would be evaluated similar to if it had been a lateral join, i.e. there would be no extra function calls in this case after removing the fence.

Sort of. PostgreSQL has a wart around (x).* expansion where it's essentially macro-expanded into

(x).a, (x).b, (x).c, ...

Now, if x is a function call, PG will merrily execute it n times for its n output columns.

Andres is working on fixing this. And it's trivially worked around with a lateral query ; the above would be better written as

select (x).*

from mytable

cross join lateral json_populate_record(null::mytype, myjson) as x;

So this example just abuses our optimiser hint behaviour for CTEs to avoid solving a planner issue (why project policy is against hints). But there's already a solution.

I'm finding it increasingly hilarious watching people vociferously defending their one precious (semi-documented) query/optimiser hint in PostgreSQL. The one we don't admit is a hint, but treat as one by avoiding optimising across it when it's safe to do so.

We can't remove or change our precious hint because we need it to solve production issues. But we don't have hints because then people wouldn't report planner/optimiser issues, would lock in bad plans and then complain about it, etc.

Just like what's happening here. And people are leaping to defend it, lest we risk exposing performance issues by changing anything, even though all we're doing is documenting what is already so.

Hey. Crazy idea for backward compat to address Tom's complaint that adding explicit syntax would require people who wanted the old behaviour to make their queries incompatible with pg10 and below. Add the "MATERIALIZED" keyword or whatever. The back patch the keyword as a no-op, since that's what we already do in back branches. I can't see anything that could possibly break in that context so long as we only go as far back as it was already a keyword elsewhere.

We could at least add it to pg10.

Re: [HACKERS] CTE inlining

From

Joe Conway

Date:

05 May 2017, 03:17:05

On 05/04/2017 05:03 PM, Craig Ringer wrote:
> On 5 May 2017 02:52, "Tom Lane" wrote:
>     I haven't been keeping close tabs either, but surely we still have
>     to have
>     the optimization fence in (at least) all these cases:
>
>     * CTE contains INSERT/UPDATE/DELETE
>     * CTE contains SELECT FOR UPDATE/SHARE (else the set of rows that get
>       locked might change)
>     * CTE contains volatile functions
>
>     I'm willing to write off cases where, eg, a function should have been
>     marked volatile and was not.  That's user error and there are plenty
>     of hazards of that kind already.  But if the optimizer has reason
>     to know that discarding the fence might change any query side-effects,
>     it mustn't.
>
> I think everyone is in total agreement there.

That's great, but if so, why do we need any change in syntax at all?

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Re: [HACKERS] CTE inlining

From

Craig Ringer

Date:

05 May 2017, 04:30:47

On 5 May 2017 at 08:17, Joe Conway <mail@joeconway.com> wrote:
> On 05/04/2017 05:03 PM, Craig Ringer wrote:
>> On 5 May 2017 02:52, "Tom Lane" wrote:
>>     I haven't been keeping close tabs either, but surely we still have
>>     to have
>>     the optimization fence in (at least) all these cases:
>>
>>     * CTE contains INSERT/UPDATE/DELETE
>>     * CTE contains SELECT FOR UPDATE/SHARE (else the set of rows that get
>>       locked might change)
>>     * CTE contains volatile functions
>>
>>     I'm willing to write off cases where, eg, a function should have been
>>     marked volatile and was not.  That's user error and there are plenty
>>     of hazards of that kind already.  But if the optimizer has reason
>>     to know that discarding the fence might change any query side-effects,
>>     it mustn't.
>>
>> I think everyone is in total agreement there.
>
> That's great, but if so, why do we need any change in syntax at all?

Because a lot of people use it as a query hint, and don't want to let
go of the one precious query hint they rely on in PostgreSQL to solve
urgent production issues or work around planner limitations. You can
currently (ab)use a CTE with fencing behaviour to force join order,
force evaluation of expensive functions, etc, if the planner would
otherwise pick a semantically identical plan that lands up running
slower.

Many people don't want to give up their hint because it's an important
tool in their "solve user/customer problems fast" toolbox.

Project policy says we don't want query hints, don't have them, and
won't add them. I understand the reasoning for this (and mostly
agree). But we kind of added one by accident anyway, and it's proving
rather hard to take it away!

We added it by documenting an implementation detail/limitation - that
there's limited or no inlining/pullup/pushdown across CTE terms.
People took that as blessing to rely on that behaviour in future
releases, treating it as an optimisation barrier query hint. Since
then some people have been promoting it as a workaround for
performance issues faced by queries with sub-optimal plans.

People "in the know" used to use OFFSET 0 for that instead, but it was
entirely undocumented and could go away in a future release without
any warning, plus it was uglier. The CTE limitation-as-feature got a
lot more press in blogs, etc since it's documented-ish.

So now we're collectively trying to get to a compromise position where
people who rely on it can keep the functionality by declaring the
no-inline requirement explicitly in their SQL where they depend on it.
That way it won't penalise everyone else who just wants the optimiser
to do its best the rest of the time, so we can just write SQL that
says "I want this outcome" without having to worry about a
semi-documented performance booby-trap. So surprise and confuse users
who're (reasonably) expecting the optimiser to treat a CTE like a
subquery, view or SQL function, those who're writing portable SQL and
those  who're porting from other DBMSes.

The sticking point is that this seems to require admitting that it
serves as a hint, of sorts, where it forces the optimiser to choose a
different plan than its cost based estimates would suggest if all
candidate plans are semantically identical.

We're carefully maintaining this bizarre cognitive dissonance where we
justify the need for using this as a planner hint at the same time as
denying that we have a hint. That makes it hard to make progress here.
I think there's fear that we're setting some kind of precedent by
admitting what we already have.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Remote DBA, Training
&Services

Re: [HACKERS] CTE inlining

From

Tom Lane

Date:

05 May 2017, 05:04:29

Craig Ringer <craig.ringer@2ndquadrant.com> writes:
> We're carefully maintaining this bizarre cognitive dissonance where we
> justify the need for using this as a planner hint at the same time as
> denying that we have a hint. That makes it hard to make progress here.
> I think there's fear that we're setting some kind of precedent by
> admitting what we already have.

I think you're overstating the case.  It's clear that there's a
significant subset of CTE functionality where there has to be an
optimization fence.  The initial implementation basically took the
easy way out by deeming *all* CTEs to be optimization fences.  Maybe
we shouldn't have documented that behavior, but we did.  Now we're
arguing about how much of a compatibility break it'd be to change that
planner behavior.  I don't see any particular cognitive dissonance here,
just disagreements about the extent to which backwards compatibility is
more important than better query optimization.
        regards, tom lane

Re: [HACKERS] CTE inlining

From

Serge Rielau

Date:

05 May 2017, 05:10:47

In my past life when I was faced with such debates I argued that the number of customers We are hoping to attract in the future is much bigger than the ones we risk offending.

Doesn't mean I wanted to piss everyone off. Just that I didn't want to be held hostage by history.

Cheers

Serge

PS: On the opposing side was typically the mainframe crowd.... hint hint

Re: [HACKERS] CTE inlining

From

Stephen Frost

Date:

05 May 2017, 05:14:24

Tom, all,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Craig Ringer <craig.ringer@2ndquadrant.com> writes:
> > We're carefully maintaining this bizarre cognitive dissonance where we
> > justify the need for using this as a planner hint at the same time as
> > denying that we have a hint. That makes it hard to make progress here.
> > I think there's fear that we're setting some kind of precedent by
> > admitting what we already have.
>
> I think you're overstating the case.  It's clear that there's a
> significant subset of CTE functionality where there has to be an
> optimization fence.  The initial implementation basically took the
> easy way out by deeming *all* CTEs to be optimization fences.  Maybe
> we shouldn't have documented that behavior, but we did.  Now we're
> arguing about how much of a compatibility break it'd be to change that
> planner behavior.  I don't see any particular cognitive dissonance here,
> just disagreements about the extent to which backwards compatibility is
> more important than better query optimization.

This really begs the question of if it's "better."

If it's always better, then there's no discussion to be had.  If there
are cases where it's not, then that could cause problems for people and
there's a reason to question if we're ok with that.

Of course, if a user runs into such an issue, there's a few ways they
can address it without needs any additional syntax (use a temp table
instead, introduce a volative function into the CTE, etc).

I had originally been on the fence about this myself, and even advocated
the new-syntax approach at one point (as I recall), but with the new
statistics we're gaining and the improvmenets in the planner, along with
parallel query and the advantages offered by being able to pull a CTE
into the main query, I think I've come around to agree that removing the
optimization fence, when it won't change the resullts, is a good thing
to do.

Thanks!

Stephen

Re: [HACKERS] CTE inlining

From

Joe Conway

Date:

05 May 2017, 05:57:21

On 05/04/2017 07:04 PM, Tom Lane wrote:
> Craig Ringer <craig.ringer@2ndquadrant.com> writes:
>> We're carefully maintaining this bizarre cognitive dissonance where we
>> justify the need for using this as a planner hint at the same time as
>> denying that we have a hint. That makes it hard to make progress here.
>> I think there's fear that we're setting some kind of precedent by
>> admitting what we already have.
>
> I think you're overstating the case.  It's clear that there's a
> significant subset of CTE functionality where there has to be an
> optimization fence.  The initial implementation basically took the
> easy way out by deeming *all* CTEs to be optimization fences.  Maybe
> we shouldn't have documented that behavior, but we did.  Now we're
> arguing about how much of a compatibility break it'd be to change that
> planner behavior.  I don't see any particular cognitive dissonance here,
> just disagreements about the extent to which backwards compatibility is
> more important than better query optimization.

Exactly.

One thought, is that we treat a CTE in a similar way to foreign tables,
with the same set of push downs.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Re: [HACKERS] CTE inlining

From

Andres Freund

Date:

05 May 2017, 06:00:02

Hi,

On 2017-05-04 19:57:21 -0700, Joe Conway wrote:
> One thought, is that we treat a CTE in a similar way to foreign tables,
> with the same set of push downs.

A bit surprised about that suggestion - there seems to be very little
similarity between the cases. What'd be the benefit of that?  With FDWs
there's not even a common set of things pushed down between different
FDW users.

Greetings,

Andres Freund

Re: [HACKERS] CTE inlining

From

Albe Laurenz

Date:

05 May 2017, 10:27:58

Thomas Kellerer wrote:
>> 1) we switch unmarked CTEs as inlineable by default in pg11.
>
> +1 from me for option 1

+1 from me as well, FWIW.

Yours,
Laurenz Albe

Re: [HACKERS] CTE inlining

From

David Rowley

Date:

05 May 2017, 15:43:33

On 5 May 2017 at 14:04, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Craig Ringer <craig.ringer@2ndquadrant.com> writes:
>> We're carefully maintaining this bizarre cognitive dissonance where we
>> justify the need for using this as a planner hint at the same time as
>> denying that we have a hint. That makes it hard to make progress here.
>> I think there's fear that we're setting some kind of precedent by
>> admitting what we already have.
>
> I think you're overstating the case.  It's clear that there's a
> significant subset of CTE functionality where there has to be an
> optimization fence.  The initial implementation basically took the
> easy way out by deeming *all* CTEs to be optimization fences.  Maybe
> we shouldn't have documented that behavior, but we did.  Now we're
> arguing about how much of a compatibility break it'd be to change that
> planner behavior.  I don't see any particular cognitive dissonance here,
> just disagreements about the extent to which backwards compatibility is
> more important than better query optimization.

How about we get the ball rolling on this in v10 and pull that part
out of the docs. If anything that'll buy us a bit more wiggle room to
change this in v11.

I've attached a proposed patch.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

doc_caution_about_cte_changes_in_the_future.patch

Re: [HACKERS] CTE inlining

From

Vik Fearing

Date:

06 May 2017, 23:38:33

On 05/03/2017 07:33 PM, Alvaro Herrera wrote:
> 1) we switch unmarked CTEs as inlineable by default in pg11.  What seems
> likely to happen for a user that upgrades to pg11 is that 5 out of 10
> CTE-using queries are going to become faster than with pg10, and they
> are going to be happy; 4 out of five are going to see no difference, but
> they didn't have to do anything about it; and the remaining query is
> going to become slower, either indistinguishably so (in which case they
> don't care and they remain happy because of the other improvements) or
> notably so, in which case they can easily figure where to add the
> MATERIALIZED option and regain the original performance.

I am overwhelmingly in favor of this option.

I am in favor of an enable_inlinedcte guc in the same spirit as the
other enable_* gucs, but violently opposed to any "compatibility" guc.

-- 
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: [HACKERS] CTE inlining

From

Ilya Shkuratov

Date:

09 May 2017, 21:26:24

Ok, it seems that most people in discussion are agree that removing optimization
fence is a right thing to do. But so far the main topic was whether it worth to 
make "inlining" by default, and how we should enable it.

Nonetheless I still hoping to discuss the algorithm and its implementation. 

I suppose, in case of a single reference we can validate CTE subquery and inline it
just before SS_process_ctes() in subquery_planner() and then process remaining
CTEs as before. 

The case of multiple reference is more interesting.
Ideally, we would decide whether to inline just before pull_up_sublinks(), so all 
the optimizations can be applied to inlined subquery. But It is impossible as we 
have no information to build subquery paths and estimate they costs at this point. 
All necessary initialization is performed in query_planner(), that invoked far
later in grouping_planner(). (As far as I understand.)

The most straighforward way is to compare CTE scan cost with subquery execution
and result scan cost in set_rel_size(), just after set_cte_pathlist(), and alter 
RelOptInfo, if we choose to inline.
(e.g (CTE scan) < (cheapest_path(subquery) + subquery scan))
This way we still can push down predicates as it is performed in 
set_subquery_pathlist(), but we missed pull_up_subquery().
Besides, it seems like a dirty quick solution.

Maybe it possible to add subquery scan to RTE_CTE RelOptInfo, but I'm not sure.

So what is a right way to conduct comparison between CTE scan and subquery 
execution with subsequent scan?

I am new to PostgreSQL development, so I need a guidance from someone who 
familiar with optimizer infrastructure to ensure that I moving in a right 
direction and not making something weird.


P.S. There is a paper [1] describing implementation of CTE optimization in Orca 
optimizer. It may be useful, though architecture is completely different.

[1] Optimization of Common Table Expressions in MPP Database Systems 
(http://www.vldb.org/pvldb/vol8/p1704-elhelw.pdf)


Ilya Shkuratov

Re: [HACKERS] CTE inlining

From

Peter Eisentraut

Date:

09 May 2017, 22:15:41

On 5/5/17 08:43, David Rowley wrote:
> How about we get the ball rolling on this in v10 and pull that part
> out of the docs. If anything that'll buy us a bit more wiggle room to
> change this in v11.
> 
> I've attached a proposed patch.

If we just tell them that the thing they might have relied on might go
away, without a replacement to suggest, then we're just confusing and
scaring them, no?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CTE inlining

From

"David G. Johnston"

Date:

09 May 2017, 22:22:54

On Tue, May 9, 2017 at 12:15 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

On 5/5/17 08:43, David Rowley wrote:
> How about we get the ball rolling on this in v10 and pull that part
> out of the docs. If anything that'll buy us a bit more wiggle room to
> change this in v11.
>
> I've attached a proposed patch.

If we just tell them that the thing they might have relied on might go
away, without a replacement to suggest, then we're just confusing and
scaring them, no?

We'd end up suggesting our OFFSET 0 hack as true protection. If they know for a fact that their use of CTE for its barrier properties is not supported they are also more likely to document intentional usage with something like: "-- CHANGE THIS ONCE VERSION 11 IS RELEASED!!! --" which would make finding the call sites that need to add the new "MATERIALIZED" keyword much easier.

David J.

Re: [HACKERS] CTE inlining

From

Serge Rielau

Date:

09 May 2017, 22:29:31

On Tue, May 9, 2017 at 12:22 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Tue, May 9, 2017 at 12:15 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 5/5/17 08:43, David Rowley wrote:
> How about we get the ball rolling on this in v10 and pull that part
> out of the docs. If anything that'll buy us a bit more wiggle room to
> change this in v11.
>
> I've attached a proposed patch.

If we just tell them that the thing they might have relied on might go
away, without a replacement to suggest, then we're just confusing and
scaring them, no?

We'd end up suggesting our OFFSET 0 hack as true protection. If they know for a fact that their use of CTE for its barrier properties is not supported they are also more likely to document intentional usage with something like: "-- CHANGE THIS ONCE VERSION 11 IS RELEASED!!! --" which would make finding the call sites that need to add the new "MATERIALIZED" keyword much easier.

How about adding MATERIALIZED now (in 10) as a noise word.

Give people a release to switch over before pulling the rug..

Cheers

Serge

Re: [HACKERS] CTE inlining

From

Tom Lane

Date:

09 May 2017, 22:36:42

"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Tue, May 9, 2017 at 12:15 PM, Peter Eisentraut <
> peter.eisentraut@2ndquadrant.com> wrote:
>> If we just tell them that the thing they might have relied on might go
>> away, without a replacement to suggest, then we're just confusing and
>> scaring them, no?

> We'd end up suggesting our OFFSET 0 hack as true protection.

Considering that many of the commenters in this thread view OFFSET 0
as a vile hack that ought to go away, I can hardly see how that's
an improvement.

I tend to agree with Peter that there's no need to do anything until
we have a committable code improvement.  Documentation changes that
push people towards adding OFFSET 0, without any certainty that that
will be the long-term answer, do not seem like a net positive.

Also, considering that this behavior has been there since 8.4,
I think it's sheerest chutzpah to claim that changing the docs in
v10 would materially reduce the backward-compatibility concerns
for whatever we might do in v11.
        regards, tom lane

Re: [HACKERS] CTE inlining

From

"David G. Johnston"

Date:

09 May 2017, 22:51:58

On Tue, May 9, 2017 at 12:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Also, considering that this behavior has been there since 8.4,
I think it's sheerest chutzpah to claim that changing the docs in
v10 would materially reduce the backward-compatibility concerns
for whatever we might do in v11.

No it won't, but those who are using 10 as their first version would probably be happy if this was covered in a bit more depth. Even a comment like "Unlike most other DBMS PostgreSQL presently executes the subquery in the CTE in relative isolation. It is suggested that you document any intentional usage of this optimization fence as a query planning hint so that should the default change in the future you can update the query to support whatever official syntax is implemented to retain this behavior.

We cannot really help those who have been using this since 8.4 and won't read the relevant docs because they don't need to learn anything new; but we can at least avoid subjecting newer customers. I'm not that strongly in favor of it but it would be a nice touch - even if it won't change the risk/reward calculation in any meaningful way.

David J.

Re: [HACKERS] CTE inlining

From

Tom Lane

Date:

09 May 2017, 23:14:20

"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Tue, May 9, 2017 at 12:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Also, considering that this behavior has been there since 8.4,
>> I think it's sheerest chutzpah to claim that changing the docs in
>> v10 would materially reduce the backward-compatibility concerns
>> for whatever we might do in v11.

> No it won't, but those who are using 10 as their first version would
> probably be happy if this was covered in a bit more depth.

I think the existing doc text is perfectly clear (while David's proposed
replacement text is not).

> Even a comment
> like "Unlike most other DBMS PostgreSQL presently executes the subquery in
> the CTE in relative isolation.  It is suggested that you document any
> intentional usage of this optimization fence as a query planning hint so
> that should the default change in the future you can update the query to
> support whatever official syntax is implemented to retain this behavior.

Well, TBH that is pre-judging what (if anything) is going to be changed
by a feature that we don't even have design consensus on, let alone a
patch for.  I don't think that's an improvement or a good service to
our users; it's just promoting confusion.  I think that until we do have
a design and a patch, we're better off leaving the docs reading the way
they have for the past eight years.

I'm also a bit mystified by the apparent urgency to change something,
anything, in this area when we're already past feature freeze.  This
would be a fit subject for discussion several months from now when
v11 development opens, but right now it's just distracting us from
stabilizing v10.
        regards, tom lane

Re: [HACKERS] CTE inlining

From

Tom Lane

Date:

10 May 2017, 00:14:19

Ilya Shkuratov <motr.ilya@ya.ru> writes:
> Ok, it seems that most people in discussion are agree that removing optimization
> fence is a right thing to do.
> Nonetheless I still hoping to discuss the algorithm and its implementation.

Yeah, so far we've mainly discussed whether to do that and how to control
it, not what the actual results would be.

> I suppose, in case of a single reference we can validate CTE subquery and inline it
> just before SS_process_ctes() in subquery_planner() and then process remaining
> CTEs as before.

Yeah, something like that ought to be possible.  I'd prefer to think of it
as related to the work prepjointree.c does, ie pull_up_sublinks, which
raises the question why we do SS_process_ctes before that not after it.

You might want to start by postponing planning of CTE subqueries as late
as possible, without any other changes.  In particular, I think it'd be
a good thing to postpone Path-to-Plan conversion of the subqueries
until much later, ideally not till the final toplevel create_plan() call.
This is tied up with early Plan creation for regular subqueries too, so
that might have to be an all-or-nothing conversion.  But we aren't really
going to have much flexibility of planning for subqueries until we do
that.

> The case of multiple reference is more interesting.
> Ideally, we would decide whether to inline just before pull_up_sublinks(), so all
> the optimizations can be applied to inlined subquery. But It is impossible as we
> have no information to build subquery paths and estimate they costs at this point.

TBH, I would just ignore that completely, at least until the simpler
case is done and committed.  Trying to bite that off as part of the
initial patch is likely to lead to never getting anything done at all.
And I'm not exactly convinced either that it will win often enough to
be worth the trouble, or that users would thank you for rewriting their
queries that way.

When and if we get to this, you could imagine tackling it a bit like
preprocess_minmax_aggregates, or like what I did with OR clauses in
<22002.1487099934@sss.pgh.pa.us>, ie just repeat a lot of the work
over again to get several complete Path trees, and then pick one.
This is kind of expensive and ugly but it's hard to see how to do it
sensibly in a bottom-up fashion.
        regards, tom lane

Re: [HACKERS] CTE inlining

From

Andrew Dunstan

Date:

10 May 2017, 01:01:01


On 05/09/2017 04:14 PM, Tom Lane wrote:
>
> Well, TBH that is pre-judging what (if anything) is going to be changed
> by a feature that we don't even have design consensus on, let alone a
> patch for.  I don't think that's an improvement or a good service to
> our users; it's just promoting confusion.  I think that until we do have
> a design and a patch, we're better off leaving the docs reading the way
> they have for the past eight years.
>
> I'm also a bit mystified by the apparent urgency to change something,
> anything, in this area when we're already past feature freeze.  This
> would be a fit subject for discussion several months from now when
> v11 development opens, but right now it's just distracting us from
> stabilizing v10.
>
>             


+1. We should not be in the business of documenting or declaring our
intention of having vaporware. Say we get a patch and for some reason it
gets rejected. We've had features that took two or three releases to get
accepted. This is why we've historically shied away from roadmaps. And
we've been bitten in the past by deprecating things only to undeprecate
them later.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CTE inlining

From

Yaroslav

Date:

12 May 2017, 00:37:27

Ilya Shkuratov wrote
> First of all, to such replacement to be valid, the CTE must be 
>     1. non-writable (e.g. be of form: SELECT ...),
>     2. do not use VOLATILE or STABLE functions,
>     3. ... (maybe there must be more restrictions?) 

What about simple things like this?

CREATE OR REPLACE FUNCTION z(numeric) RETURNS boolean AS $$
BEGIN
RETURN $1 <> 0;
END;
$$ LANGUAGE plpgSQL IMMUTABLE COST 1000;

-- This one works:
WITH T AS (
SELECT 1.0 AS v1, 0.0 AS v2
UNION ALL
SELECT 3.0, 1.0
UNION ALL
SELECT 2.0, 0.0
), a AS (
SELECT * FROM tWHERE z(v2)
)
SELECT * FROM aWHERE v1/v2 > 1.5;

-- This one gives 'division by zero':
WITH T AS (
SELECT 1.0 AS v1, 0.0 AS v2
UNION ALL
SELECT 3.0, 1.0
UNION ALL
SELECT 2.0, 0.0
)
SELECT * FROM (      SELECT *        FROM t       WHERE z(v2)      ) AS aWHERE v1/v2 > 1.5;




-----
WBR, Yaroslav Schekin.
--
View this message in context: http://www.postgresql-archive.org/CTE-inlining-tp5958992p5961086.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] CTE inlining

From

Andreas Joseph Krogh

Date:

12 May 2017, 09:35:48

På torsdag 11. mai 2017 kl. 23:37:27, skrev Yaroslav <ladayaroslav@yandex.ru>:

Ilya Shkuratov wrote
> First of all, to such replacement to be valid, the CTE must be
> 1. non-writable (e.g. be of form: SELECT ...),
> 2. do not use VOLATILE or STABLE functions,
> 3. ... (maybe there must be more restrictions?)

What about simple things like this?

CREATE OR REPLACE FUNCTION z(numeric) RETURNS boolean AS $$
BEGIN
RETURN $1 <> 0;
END;
$$ LANGUAGE plpgSQL IMMUTABLE COST 1000;

-- This one works:
WITH T AS (
SELECT 1.0 AS v1, 0.0 AS v2
UNION ALL
SELECT 3.0, 1.0
UNION ALL
SELECT 2.0, 0.0
), a AS (
SELECT *
FROM t
WHERE z(v2)
)
SELECT *
FROM a
WHERE v1/v2 > 1.5;

-- This one gives 'division by zero':
WITH T AS (
SELECT 1.0 AS v1, 0.0 AS v2
UNION ALL
SELECT 3.0, 1.0
UNION ALL
SELECT 2.0, 0.0
)
SELECT *
FROM (
SELECT *
FROM t
WHERE z(v2)
) AS a
WHERE v1/v2 > 1.5;

From a non-hacker;

Just to se what other RDBMS are doing with CTEs; Look at slide 31 here: https://www.percona.com/live/17/sites/default/files/slides/Recursive%20Query%20Throwdown.pdf

PG is not on top wrt. CTE, but could have been if CTEs were not this "established" fence.

+1 for removing this fence and get all the possible optimization we can.

--
Andreas Joseph Krogh

Re: [HACKERS] CTE inlining

From

Thomas Kellerer

Date:

12 May 2017, 11:35:20

> Just to se what other RDBMS are doing with CTEs; Look at slide
> 31 here: https://www.percona.com/live/17/sites/default/files/slides/Recursive%20Query%20Throwdown.pdf

That is taken from Markus Winand's post:

https://twitter.com/MarkusWinand/status/852862475699707904
  "Seems like MySQL is getting the best WITH support of all tested DBs
(RECURSIVE not tested!)"



--
View this message in context: http://www.postgresql-archive.org/CTE-inlining-tp5958992p5961164.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] CTE inlining

From

Bruce Momjian

Date:

12 May 2017, 23:39:46

On Tue, May  9, 2017 at 05:14:19PM -0400, Tom Lane wrote:
> Ilya Shkuratov <motr.ilya@ya.ru> writes:
> > Ok, it seems that most people in discussion are agree that removing optimization
> > fence is a right thing to do.
> > Nonetheless I still hoping to discuss the algorithm and its implementation. 
> 
> Yeah, so far we've mainly discussed whether to do that and how to control
> it, not what the actual results would be.

To summarize, it seems we have two options if we want to add fence
control to CTEs:

1.  add INLINE to disable the CTE fence
2.  add MATERIALIZE to enable the CTE fence

or some other keywords.  I think most people prefer #2 because:

*  most users writing queries prefer #2
*  most users assume full optimization and it seems natural to turn  _off_ an optimization via a keyword 
*  while some queries can be inlined, all queries can be materialized,  so doing #1 means INLINE would be only a
preference,which could be  confusing
 

Anyway, I am very glad we are considering addressing this in PG 11.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Re: [HACKERS] CTE inlining

From

Merlin Moncure

Date:

12 May 2017, 23:52:40

On Fri, May 12, 2017 at 3:39 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Tue, May  9, 2017 at 05:14:19PM -0400, Tom Lane wrote:
>> Ilya Shkuratov <motr.ilya@ya.ru> writes:
>> > Ok, it seems that most people in discussion are agree that removing optimization
>> > fence is a right thing to do.
>> > Nonetheless I still hoping to discuss the algorithm and its implementation.
>>
>> Yeah, so far we've mainly discussed whether to do that and how to control
>> it, not what the actual results would be.
>
> To summarize, it seems we have two options if we want to add fence
> control to CTEs:
>
> 1.  add INLINE to disable the CTE fence
> 2.  add MATERIALIZE to enable the CTE fence
>
> or some other keywords.  I think most people prefer #2 because:
>
> *  most users writing queries prefer #2

Yeah, I think there was rough consensus on this point.    I think it's
fair to assume that most (or at least a majority) of queries will
benefit from inlining, people would want to opt out of, rather than
opt in to, generally good optimization strategies.   This will hit
some in people today, but this is not a backwards compatibility issue
since performance is generally not really fairly described as
compatibility criteria.  If this feature drops we ought to warn people
in the release notes though.

merlin

Re: [HACKERS] CTE inlining

From

Adam Brusselback

Date:

15 May 2017, 18:57:20

From a user's perspective:

>I think most people prefer #2 because:

>* most users writing queries prefer #2

>* most users assume full optimization and it seems natural to turn

> _off_ an optimization via a keyword

>* while some queries can be inlined, all queries can be materialized,

> so doing #1 means INLINE would be only a preference, which could be

> confusing

I completely agree with this reasoning. I have a few queries I would have to touch to add "MATERIALIZED", but the vast majority of CTE's in my codebase would get a speedup. It would allow usage of CTE's more freely than now. I currently avoid them unless it really simplifies a query because of the optimization fence.

Not that my opinion holds any weight, but the extra keyword for enabling the optimization fence is my preference. By default trying to optimize more is a good thing IMO.

>Anyway, I am very glad we are considering addressing this in PG 11.

Seconded, this is a sore spot for me when using Postgres, and i'd love to not have it be an issue any more.

Thanks,

-Adam

Re: [HACKERS] CTE inlining

From

David Rowley

Date:

16 May 2017, 07:19:01

On 13 May 2017 at 08:39, Bruce Momjian <bruce@momjian.us> wrote:
> To summarize, it seems we have two options if we want to add fence
> control to CTEs:
>
> 1.  add INLINE to disable the CTE fence
> 2.  add MATERIALIZE to enable the CTE fence

I think #1 is out of the question.

What would we do in cases like:

WITH INLINE cte AS (SELECT random() a)
SELECT * FROM cte UNION SELECT * FROM cte;

I assume we won't want to inline when the CTE query contains a
volatile function, and we certainly won't in cases like:

WITH INLINE cte AS (DELETE FROM a RETURNING *)
INSERT INTO b SELECT * from cte WHERE cte.value > 5;

We'd be certain to receive complaints from disgruntled users about
"Why is this not inlined when I specified INLINE?"

#2 does not suffer from that.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services