Thread: [HACKERS] CTE inlining
Hello, dear hackers! There is task in todo list about optional CTE optimization fence disabling. I am not interested at this point in disabling mechanism implementation, but I would like to discuss the optimization mechanism, that should work when the fence is disabled. It seems, that we can replace CTE with subquery, so the optimizer can do all available optimizations. This idea is quite straightforward, but I could not find a discussion of it. (Maybe it is so, because everyone knows that the idea is bad and it is not worth to discuss. But I hope it is not, so I start this thread. =)) First of all, to such replacement to be valid, the CTE must be 1. non-writable (e.g. be of form: SELECT ...), 2. donot use VOLATILE or STABLE functions, 3. ... (maybe there must be more restrictions?) Also, before inlining, we should check that some optimization can be applied, using functions from 'pull_up_subqueries_recurse' and 'subquery_push_qual'. If it is true, and there only one reference to CTE, we can inline it immediately. What it is not clear is how we should estimate whether it is worth to inline, when there is multiple references. Here are my preliminary ideas. Let consider "pull up subquery" and "push down qualifiers" cases separately. For "push down qualifiers", if `subquery_push_qual` is `true`, we can do the following: 1. copy CTE subquery, 2. push down quals, 3. find paths, 3. inline if cost of (CTEscan) > (cheapest_path(subquery) + subquery scan) Probably, this approach is not feasible, because it involves subquery replaning, and we should consider a more "lightweight" heuristic. For "pull up subquery" similar approach may lead to duplicate planning of the whole query, that almost sure is too expensive. So I wonder, is it possible to estimate a join predicate selectivity against CTE subquery result and inline it if selectivity is "high" enough? (If it is possible the same can be applied to the first case.) I would be glad to hear feedback on described approach. Ilya Shkuratov
On 30 Apr. 2017 07:56, "Ilya Shkuratov" <motr.ilya@ya.ru> wrote:
Hello, dear hackers!
There is task in todo list about optional CTE optimization fence
disabling.
I am not interested at this point in disabling mechanism
implementation, but I would like to discuss the optimization
mechanism, that should work when the fence is disabled.
It's looking at what other DBMSes do.
Notably MS SQL Server. AFAIK its CTEs are a lot like query-scoped views. They are simply updatable where possible, so you can write
WITH x AS (...)
UPDATE x SET ...
I do not know how MS SQL handles inlining and pullup/pushdown vs materialization, handles multiple evaluation costs, etc.
This is the model I would want to aim for.
It seems, that we can replace CTE with subquery, so the optimizer
can do all available optimizations. This idea is quite
straightforward, but I could not find a discussion of it.
(Maybe it is so, because everyone knows that the idea is bad and it is
not worth to discuss. But I hope it is not, so I start this thread. =))
It's not bad for SELECT.
But there are complexities.
- CTE terms may contain data-mutating functions people are relying on not multiply executing;
- we document that in postgres CTEs act as optimisation fences even with the standard syntax. So users rely on this as a query hint. Personally I want to relnotes this and tell people to use our OFFSET 0 hint instead, or add a NOINLINE option to our CTEs, then make pg allow inlining by default. This is a BC break, but not a big one if we restrict inlining of volatile. And since we don't have query hints (*cough*) by project policy, we can't really object to removing one can we?
- as you noted, it is hard to decide when it's worth inlining vs materializing for CTE terms referenced more than once. We should possibly start small and only inline single reference terms in the first release. We'd continue to force materializing of multiple reference terms.
That'd at least help people who use CTEs to write clearer queries not suffer for it. And it'd give us experience to help with conservatively introducing multiple reference inlining.
Craig Ringer <craig.ringer@2ndquadrant.com> writes: > - as you noted, it is hard to decide when it's worth inlining vs > materializing for CTE terms referenced more than once. [ raised eyebrow... ] Please explain why the answer isn't trivially "never". There's already a pretty large hill to climb here in the way of breaking peoples' expectations about CTEs being optimization fences. Breaking the documented semantics about CTEs being single-evaluation seems to me to be an absolute non-starter. regards, tom lane
2017-04-30 6:28 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
Craig Ringer <craig.ringer@2ndquadrant.com> writes:
> - as you noted, it is hard to decide when it's worth inlining vs
> materializing for CTE terms referenced more than once.
[ raised eyebrow... ] Please explain why the answer isn't trivially
"never".
There's already a pretty large hill to climb here in the way of
breaking peoples' expectations about CTEs being optimization
fences. Breaking the documented semantics about CTEs being
single-evaluation seems to me to be an absolute non-starter.
why we cannot to introduce GUC option - enable_cteoptfence ?
Regards
Pavel
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-04-30 00:28:46 -0400, Tom Lane wrote: > There's already a pretty large hill to climb here in the way of > breaking peoples' expectations about CTEs being optimization > fences. Breaking the documented semantics about CTEs being > single-evaluation seems to me to be an absolute non-starter. If all referenced functions are non-volatile, I don't quite see the problem? Personally I believe we'll have to offer a proper anti-inlining workaround anyway, and in that case there's really nothing that should stop us from inlining CTE without volatile functions twice? - Andres
On 30 Apr. 2017 13:28, "Andres Freund" <andres@anarazel.de> wrote:
On 2017-04-30 00:28:46 -0400, Tom Lane wrote:If all referenced functions are non-volatile, I don't quite see the
> There's already a pretty large hill to climb here in the way of
> breaking peoples' expectations about CTEs being optimization
> fences. Breaking the documented semantics about CTEs being
> single-evaluation seems to me to be an absolute non-starter.
problem? Personally I believe we'll have to offer a proper
anti-inlining workaround anyway, and in that case there's really nothing
that should stop us from inlining CTE without volatile functions twice?
Exactly.
The initial implementation had limitations. So they got documented as features, not bugs or possible future enhancements. Yay? So we're stuck with it forever?
I agree we shouldn't break working, correct queries such that they return different results. But if someone is lying about volatility they don't get the expectation of correctness. And we have a policy against hints, so surely we should be keen to remove this hack that serves as a hint - right?
We have OFFSET 0 for anyone really depending on it, and at least when you read that you know to go "wtf" and look at the manual, wheras the CTE fence behaviour is invisible and silent.
Yes, experienced and established postgres users expect the optimisation fence behaviour. They abuse it as a query hint or work around it with subqueries in FROM. They also know OFFSET 0 ... and ideally should even read the relnotes. Users from other DMBSes looking to migrate, and new users, are regularly surprised by our CTEs. I see it a lot on Stack Overflow and other places outside our comfortable walls.
Personally I find it very annoying when I'd like to use CTEs to structure queries more readably, but land up having to use subqueries in FROM instead.
Like the work Andes has been doing on our bizarre handing of SRFs in the SELECT target list I really think it's just something that needs to be done.
Hi, On 2017-04-30 13:58:14 +0800, Craig Ringer wrote: > We have OFFSET 0 for anyone really depending on it, and at least when you > read that you know to go "wtf" and look at the manual, wheras the CTE fence > behaviour is invisible and silent. I don't think that's a good idea. What if you need to prevent inlining of something that actually needs an offset? What if the behaviour of offset is ever supposed to change? Relying more on that seems to just be repeating the mistake around CTEs. > Like the work Andes has been doing on our bizarre handing of SRFs in the > SELECT target list I really think it's just something that needs to be > done. With help from Tom, luckily... Greetings, Andres Freund
30.04.2017, 08:58, "Craig Ringer" <craig.ringer@2ndquadrant.com>:
On 30 Apr. 2017 13:28, "Andres Freund" <andres@anarazel.de> wrote:On 2017-04-30 00:28:46 -0400, Tom Lane wrote:If all referenced functions are non-volatile, I don't quite see the
> There's already a pretty large hill to climb here in the way of
> breaking peoples' expectations about CTEs being optimization
> fences. Breaking the documented semantics about CTEs being
> single-evaluation seems to me to be an absolute non-starter.
problem? Personally I believe we'll have to offer a proper
anti-inlining workaround anyway, and in that case there's really nothing
that should stop us from inlining CTE without volatile functions twice?Exactly.The initial implementation had limitations. So they got documented as features, not bugs or possible future enhancements. Yay? So we're stuck with it forever?I agree we shouldn't break working, correct queries such that they return different results. But if someone is lying about volatility they don't get the expectation of correctness. And we have a policy against hints, so surely we should be keen to remove this hack that serves as a hint - right?We have OFFSET 0 for anyone really depending on it, and at least when you read that you know to go "wtf" and look at the manual, wheras the CTE fence behaviour is invisible and silent.Yes, experienced and established postgres users expect the optimisation fence behaviour. They abuse it as a query hint or work around it with subqueries in FROM. They also know OFFSET 0 ... and ideally should even read the relnotes. Users from other DMBSes looking to migrate, and new users, are regularly surprised by our CTEs. I see it a lot on Stack Overflow and other places outside our comfortable walls.Personally I find it very annoying when I'd like to use CTEs to structure queries more readably, but land up having to use subqueries in FROM instead.Like the work Andes has been doing on our bizarre handing of SRFs in the SELECT target list I really think it's just something that needs to be done.
Also, I would like to remind that the disabling optimization fence is suggested to be OPTIONAL.
So we don't break peoples' expectations, nor documented semantics.
On 04/30/2017 06:28 AM, Tom Lane wrote: > Craig Ringer <craig.ringer@2ndquadrant.com> writes: >> - as you noted, it is hard to decide when it's worth inlining vs >> materializing for CTE terms referenced more than once. > > [ raised eyebrow... ] Please explain why the answer isn't trivially > "never". > > There's already a pretty large hill to climb here in the way of > breaking peoples' expectations about CTEs being optimization > fences. Breaking the documented semantics about CTEs being > single-evaluation seems to me to be an absolute non-starter. > I'm not sure that's a universal expectation, though. I know there are people who actually do rely on that intentionally, no doubt about that. And we'd nee to make it work for them. But I keep running into people who face serious performance issues exactly because not realizing this, and using CTEs as named subqueries. And when I tell them "optimization fence" they react "Whaaaaaaat?" If I had to make up some numbers, I'd say the "Whaaaaat?" group is about 10x the group of people who intentionally rely on CTEs being optimization fences. FWIW I don't know how to do this. There were multiple attempts at this in the past, none of them succeeded. But perhaps we could at least propagate some of the CTE features, so that the outside query can benefit from that (e.g. when the CTE is sorted, we could set the sortkeys). That wouldn't break the fence thing, but it would allow other stuff. regard -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 04/30/2017 09:46 AM, Andres Freund wrote: > Hi, > > On 2017-04-30 13:58:14 +0800, Craig Ringer wrote: >> We have OFFSET 0 for anyone really depending on it, and at least when you >> read that you know to go "wtf" and look at the manual, wheras the CTE fence >> behaviour is invisible and silent. > > I don't think that's a good idea. What if you need to prevent inlining > of something that actually needs an offset? What if the behaviour of > offset is ever supposed to change? Relying more on that seems to just > be repeating the mistake around CTEs. > I agree with this. But OFFSET 0 would force people to modify the queries anyway, so why not just introduce a new keyword instead? Something like: WITH FENCED a (SELECT ...) But I think something like that was proposed not too long ago, and did not make it for some reason. There's a lot of other CTE improvements that would be great. Say, being able to define indexes on them, but that's really a separate topic. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Apr 30, 2017 at 11:54:48PM +0200, Tomas Vondra wrote: > On 04/30/2017 06:28 AM, Tom Lane wrote: > > Craig Ringer <craig.ringer@2ndquadrant.com> writes: > > > - as you noted, it is hard to decide when it's worth inlining vs > > > materializing for CTE terms referenced more than once. > > > > [ raised eyebrow... ] Please explain why the answer isn't trivially > > "never". > > > > There's already a pretty large hill to climb here in the way of > > breaking peoples' expectations about CTEs being optimization > > fences. Breaking the documented semantics about CTEs being > > single-evaluation seems to me to be an absolute non-starter. > > > > I'm not sure that's a universal expectation, though. I know there > are people who actually do rely on that intentionally, no doubt > about that. And we'd nee to make it work for them. > > But I keep running into people who face serious performance issues > exactly because not realizing this, and using CTEs as named > subqueries. And when I tell them "optimization fence" they react > "Whaaaaaaat?" > > If I had to make up some numbers, I'd say the "Whaaaaat?" group is > about 10x the group of people who intentionally rely on CTEs being > optimization fences. I suspect you're off by at least a couple of orders of magnitude here, which make this even more important to deal with. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 2017-04-30 11:34:48 +0300, Ilya Shkuratov wrote: > Also, I would like to remind that the disabling optimization fence is suggested > to be OPTIONAL. > So we don't break peoples' expectations, nor documented semantics. I think however is that that's not good enough, because it'll surprise people forever, whereas using something as an intentional barrier usually is, well, intentional. Greetings, Andres Freund
On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote: > why we cannot to introduce GUC option - enable_cteoptfence ? Doesn't really solve the issue, and we've generally shied away from GUCs that influence behaviour after a few bad experiences. What if you want one CTE inlined, but another one not? - Andres
2017-05-01 1:21 GMT+02:00 Andres Freund <andres@anarazel.de>:
On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote:
> why we cannot to introduce GUC option - enable_cteoptfence ?
Doesn't really solve the issue, and we've generally shied away from GUCs
that influence behaviour after a few bad experiences. What if you want
one CTE inlined, but another one not?
It change behave in same sense like enable_nestloop, enable_hashjoin, ... with same limits.
Regards
Pavel
- Andres
On 05/01/2017 06:22 AM, Pavel Stehule wrote: > > > 2017-05-01 1:21 GMT+02:00 Andres Freund <andres@anarazel.de > <mailto:andres@anarazel.de>>: > > On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote: > > why we cannot to introduce GUC option - enable_cteoptfence ? > > Doesn't really solve the issue, and we've generally shied away from GUCs > that influence behaviour after a few bad experiences. What if you want > one CTE inlined, but another one not? > > > It change behave in same sense like enable_nestloop, enable_hashjoin, > ... with same limits. > Those (and also the other enable_*) GUCs are a great example why we should not use GUCs for tweaking planner behavior, except perhaps for the purpose of investigation. It's an extremely blunt tool. You typically want to affect just a single node in the query plan (say, one join), but those options don't allow you to do that. It's all or nothing thing. Even if you're OK with affecting the whole query, it's a separate control channel - it's not embedded in the query, the user has to set it somehow. So you either set it for the whole session (affecting all the other queries that don't really need it), or you set it before each query. Which however sucks for a number of reasons, e.g. if you have a slow query in the log, how do you know with what GUC values it was executed? (You don't, and there's no way to find out.) Exactly the same issues would affect this new GUC. It would be impossible to use multiple CTEs in the query with different fencing behavior, and it would be just as difficult to investigate. So no more planner-affecting GUCs, please, particularly if we expect regular users to use them. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 05/01/2017 09:05 AM, Tomas Vondra wrote: > On 05/01/2017 06:22 AM, Pavel Stehule wrote: >> >> >> 2017-05-01 1:21 GMT+02:00 Andres Freund <andres@anarazel.de >> <mailto:andres@anarazel.de>>: >> >> On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote: >> > why we cannot to introduce GUC option - enable_cteoptfence ? >> >> Doesn't really solve the issue, and we've generally shied away >> from GUCs >> that influence behaviour after a few bad experiences. What if >> you want >> one CTE inlined, but another one not? >> >> >> It change behave in same sense like enable_nestloop, enable_hashjoin, >> ... with same limits. >> > > Those (and also the other enable_*) GUCs are a great example why we > should not use GUCs for tweaking planner behavior, except perhaps for > the purpose of investigation. It's an extremely blunt tool. > > You typically want to affect just a single node in the query plan > (say, one join), but those options don't allow you to do that. It's > all or nothing thing. > > Even if you're OK with affecting the whole query, it's a separate > control channel - it's not embedded in the query, the user has to set > it somehow. So you either set it for the whole session (affecting all > the other queries that don't really need it), or you set it before > each query. Which however sucks for a number of reasons, e.g. if you > have a slow query in the log, how do you know with what GUC values it > was executed? (You don't, and there's no way to find out.) > > Exactly the same issues would affect this new GUC. It would be > impossible to use multiple CTEs in the query with different fencing > behavior, and it would be just as difficult to investigate. > > So no more planner-affecting GUCs, please, particularly if we expect > regular users to use them. > > +1 I still see users wanting to use the enable_foo settings in production. Having had years of telling users that CTEs are an optimization fence it doesn't seem at all nice for us to turn around and change our mind about that. I have relied on it in the past and I'm sure I'm very far from alone in that. Maybe we could allow a "decorator" that would tell the planner the CTE could be inlined? WITH INLINE mycte AS ( ...) cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1 May 2017 at 21:22, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > Having had years of telling users that CTEs are an optimization fence it > doesn't seem at all nice for us to turn around and change our mind about > that. I have relied on it in the past and I'm sure I'm very far from > alone in that. > > Maybe we could allow a "decorator" that would tell the planner the CTE > could be inlined? > > WITH INLINE mycte AS ( ...) I'd rather reverse that so we behave like other implementations by default, and have extension syntax for our no-inline query hint. And yes, that's what it is, because we'd only inline when we could produce semantically equivalent results anyway. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, May 01, 2017 at 09:22:42AM -0400, Andrew Dunstan wrote: > > So no more planner-affecting GUCs, please, particularly if we expect > > regular users to use them. > > +1 > > I still see users wanting to use the enable_foo settings in production. > > Having had years of telling users that CTEs are an optimization fence it > doesn't seem at all nice for us to turn around and change our mind about > that. I have relied on it in the past and I'm sure I'm very far from > alone in that. You are certainly not alone, but I believe that in this you're missing the vast majority (we hope) of PostgreSQL users. These are the users who have yet to adopt PostgreSQL, and have the quite reasonable expectation that ordinary-looking grammar *isn't* an optimization fence. > Maybe we could allow a "decorator" that would tell the planner the CTE > could be inlined? > > WITH INLINE mycte AS ( ...) +1 for a decorator, -1 for this one. We already have an explicit optimization fence with OFFSET 0, and I think making optimization fences explicit is how we should continue. I'd be more in favor of something along the lines of WITH FENCED /* Somewhat fuzzy. What fence? */ or WITH AT_MOST_ONCE /* Clearer, but not super precise */ or WITH UNIQUE_ATOMIC /* More descriptive, but not super clear without the docs in hand */ or something along that line. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 05/01/2017 04:17 PM, David Fetter wrote: >> Maybe we could allow a "decorator" that would tell the planner the CTE >> could be inlined? >> >> WITH INLINE mycte AS ( ...) > > +1 for a decorator, -1 for this one. I am not sure I like decorators since this means adding an ad hoc query hint directly into the SQL syntax which is something which I requires serious consideration. > We already have an explicit optimization fence with OFFSET 0, and I > think making optimization fences explicit is how we should continue. > I'd be more in favor of something along the lines of > > WITH FENCED /* Somewhat fuzzy. What fence? */ > or > WITH AT_MOST_ONCE /* Clearer, but not super precise */ > or > WITH UNIQUE_ATOMIC /* More descriptive, but not super clear without the docs in hand */ > > or something along that line. What about WITH MATERIALIZED, borrowing from the MySQL terminology "materialized subquery"? Andreas
On 05/01/2017 04:17 PM, David Fetter wrote:Maybe we could allow a "decorator" that would tell the planner the CTE
could be inlined?
WITH INLINE mycte AS ( ...)
+1 for a decorator, -1 for this one.
I am not sure I like decorators since this means adding an ad hoc query hint directly into the SQL syntax which is something which I requires serious consideration.
Given that we already have
"
prevent optimization"
syntax why do we need a decorator on the CTE?We already have an explicit optimization fence with OFFSET 0, and I
think making optimization fences explicit is how we should continue.
I'd be more in favor of something along the lines of
WITH FENCED /* Somewhat fuzzy. What fence? */
or
WITH AT_MOST_ONCE /* Clearer, but not super precise */
or
WITH UNIQUE_ATOMIC /* More descriptive, but not super clear without the docs in hand */
or something along that line.
What about WITH MATERIALIZED, borrowing from the MySQL terminology "materialized subquery"?
I would shorten that to "WITH MAT" except that I don't think that having two way to introduce an optimization fence is worthwhile.
If we don't optimize SRFs-in-target-list, and thus avoid multiple function evaluation for (composite_col).*, I believe a significant number of intentional optimization fence uses will be safe but behavioral changes.
David J.
On 05/01/2017 04:33 PM, David G. Johnston wrote:> On Mon, May 1, 2017 at 7:26 AM, Andreas Karlsson <andreas@proxel.se> I am not sure I like decorators since this means adding an ad hoc> query hint directly into the SQL syntax which issomething which I> requires serious consideration.>> Given that we already have> "> prevent optimization> "> syntaxwhy do we need a decorator on the CTE? I do not think I follow. Me and some other people here would ideally allow CTEs to be inlined by default. Some people today use CTEs as optimization fences, to for example control join order, and the suggestion here is to add new syntax for CTEs to allow them to selectively be used as optimization fences. > I would shorten that to "WITH MAT" except that I don't think that> having two way to introduce an optimization fence isworthwhile. You mean OFFSET 0? I have never been a fan of using it as an optimization fence. I do not think OFFSET 0 conveys clearly enough to the reader that is is an optimization fence. Andreas
On Mon, May 1, 2017 at 10:26 AM, Andreas Karlsson <andreas@proxel.se> wrote:
What about WITH MATERIALIZED, borrowing from the MySQL terminology "materialized subquery"?
+1, you beat me to it.
On 05/01/2017 04:33 PM, David G. Johnston wrote:
> On Mon, May 1, 2017 at 7:26 AM, Andreas Karlsson <andreas@proxel.se
> I am not sure I like decorators since this means adding an ad hoc
> query hint directly into the SQL syntax which is something which I
> requires serious consideration.
> I would shorten that to "WITH MAT" except that I don't think that
> having two way to introduce an optimization fence is worthwhile.
You mean OFFSET 0? I have never been a fan of using it as an optimization fence. I do not think OFFSET 0 conveys clearly enough to the reader that is is an optimization fence.
I think I was being too literal in my thinking. Proposing that we effectively do away with OFFSET 0 and instead recommend "WITH MAT name AS ()" for subqueries requiring standalone evaluation is something I can agree with. I too have always though of OFFSET 0 as hack-ish which is why my SRF usage examples have always used CTEs.
David J.
On 05/01/2017 10:17 AM, David Fetter wrote: > On Mon, May 01, 2017 at 09:22:42AM -0400, Andrew Dunstan wrote: >>> So no more planner-affecting GUCs, please, particularly if we expect >>> regular users to use them. >> +1 >> >> I still see users wanting to use the enable_foo settings in production. >> >> Having had years of telling users that CTEs are an optimization fence it >> doesn't seem at all nice for us to turn around and change our mind about >> that. I have relied on it in the past and I'm sure I'm very far from >> alone in that. > You are certainly not alone, but I believe that in this you're missing > the vast majority (we hope) of PostgreSQL users. These are the users > who have yet to adopt PostgreSQL, and have the quite reasonable > expectation that ordinary-looking grammar *isn't* an optimization > fence. > I am not in favor of seriously inconveniencing a significant number of our existing users for the sake of a bunch of people who don't now and might never in the future use Postgres. I think the bar for silent behaviour changes needs to be a bit higher than this one is. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1 May 2017 at 22:26, Andreas Karlsson <andreas@proxel.se> wrote: > I am not sure I like decorators since this means adding an ad hoc query hint > directly into the SQL syntax which is something which I requires serious > consideration. And mangling the semantics of existing syntax doesn't? That's what we do right now so we can pretend we don't have query hints while still having query hints. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Apr 30, 2017 at 5:54 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > But I keep running into people who face serious performance issues exactly > because not realizing this, and using CTEs as named subqueries. And when I > tell them "optimization fence" they react "Whaaaaaaat?" > > If I had to make up some numbers, I'd say the "Whaaaaat?" group is about 10x > the group of people who intentionally rely on CTEs being optimization > fences. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 1 May 2017 at 21:05, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > On 05/01/2017 06:22 AM, Pavel Stehule wrote: >> >> >> >> 2017-05-01 1:21 GMT+02:00 Andres Freund <andres@anarazel.de >> <mailto:andres@anarazel.de>>: >> >> On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote: >> > why we cannot to introduce GUC option - enable_cteoptfence ? >> >> Doesn't really solve the issue, and we've generally shied away from >> GUCs >> that influence behaviour after a few bad experiences. What if you >> want >> one CTE inlined, but another one not? >> >> >> It change behave in same sense like enable_nestloop, enable_hashjoin, >> ... with same limits. >> > > Those (and also the other enable_*) GUCs are a great example why we should > not use GUCs for tweaking planner behavior, except perhaps for the purpose > of investigation. It's an extremely blunt tool. Definitely agree. You can't use them to affect only a portion of a query. In fact there's no way to automatically scope them to one query at all. They're also very big hammers, and as we introduce new types of plan nodes they won't offer comprehensive control without being amended. They're a tool of last resort for dealing with problems. > Exactly the same issues would affect this new GUC. It would be impossible to > use multiple CTEs in the query with different fencing behavior, and it would > be just as difficult to investigate. Yeah, I think a GUC is a non-starter. If we want fence behaviour, we should require people to declare their desire for fence behaviour, rather than treating it as a sort of hint-as-a-bug that we grandfather in because we're so desperate not to admit we have hints. Before forming a strong view on this, I strongly everyone stick your head outside the postgresql.org lists for a while. In my experience even regular postgres users I see pop up on places like Stack Overflow tend to react to this behaviour with "WTF?!" and wonder why we haven't fixed this limitation yet, viewing it as a bug not a feature. The same logic being applied here should've prevented us from ever introducing: * inlining of SQL functions * inlining of views * inlining of subqueries ... but somehow, this one is different. We're not even removing functionality. You can still use the OFFSET 0 hack. If you need a nonzero offset that's fine, because we don't inline over OFFSET anyway. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2 May 2017 at 10:45, Craig Ringer <craig@2ndquadrant.com> wrote: > If we want fence behaviour, we should require people to declare their > desire for fence behaviour, rather than treating it as a sort of > hint-as-a-bug that we grandfather in because we're so desperate not to > admit we have hints. Answers to my question here https://dba.stackexchange.com/q/27425/7788 suggest that there's no justification in the standard for treating it as a mandatory fence. Here https://www.postgresql.org/message-id/29918.1320244719@sss.pgh.pa.us Tom says "CTEs are also treated as optimization fences; this is not so much an optimizer limitation as to keep the semantics sane when the CTE contains a writable query." In my view, if we address that by walking the query tree for volatile functions, we should be perfectly fine to inline single-reference CTEs. Is that too simplistic? Is the sticking point simply this docs text: "A useful property of WITH queries is that they are evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries. Thus, expensive calculations that are needed in multiple places can be placed within a WITH query to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. However, the other side of this coin is that the optimizer is less able to push restrictions from the parent query down into a WITH query than an ordinary subquery. The WITH query will generally be evaluated as written, without suppression of rows that the parent query might discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.)" ? ... because to me, that seems to offer a great deal of wiggle room in "less able", "generally", etc, even if we treated the docs as a hard specification of future behaviour. (Which we don't, since we change things and document the changes). We may also stop evaluation early, so there's already no guarantee the full query is run. Is there any *technical* rather than policy reason we could not safely inline a CTE term referenced by only one query, where the CTE term is a SELECT, contains no volatile function calls, is not RECURSIVE, and does not its self reference another non-inlineable CTE term? Josh Kupershmidt, in a comment on my blog post on this topic some time ago (https://blog.2ndquadrant.com/postgresql-ctes-are-optimization-fences/), pointed out that "One notable exception to the “(non-writeable) CTEs are always materialized” rule is that if a non-writeable CTE node is not referenced anywhere, it won’t actually be evaluated. For example, this query returns 1 instead of bombing out: WITH not_executed AS (SELECT 1/0), executed AS (SELECT 1) SELECT * FROM executed;" Prior discussions: * https://www.postgresql.org/message-id/201209191305.44674.db@kavod.com * http://archives.postgresql.org/pgsql-performance/2011-10/msg00208.php * https://www.postgresql.org/message-id/29918.1320244719@sss.pgh.pa.us Relevant posts where users get confused by our behaviour: * https://dba.stackexchange.com/q/127828/7788 * https://news.ycombinator.com/item?id=7023907 * https://dba.stackexchange.com/q/84760/7788 * https://dba.stackexchange.com/q/97393/7788 * http://stackoverflow.com/q/20403792/398670 * http://stackoverflow.com/q/33731068/398670 * .... Also, comments on my post: * https://blog.2ndquadrant.com/postgresql-ctes-are-optimization-fences/ -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 05/02/2017 04:38 AM, Craig Ringer wrote: > On 1 May 2017 at 22:26, Andreas Karlsson <andreas@proxel.se> wrote: >> I am not sure I like decorators since this means adding an ad hoc query hint >> directly into the SQL syntax which is something which I requires serious >> consideration. > > And mangling the semantics of existing syntax doesn't? > > That's what we do right now so we can pretend we don't have query > hints while still having query hints. I am in favor of removing the optimization fence from CTEs, and strongly prefer no fence being the default behavior since SQL is a declarative language and I think it is reasonable to assume that CTEs can be inlined. But the question is how to best remove the fence while taking into account that quite many use them as optimization fences today. I see some alternatives, none of them perfect. 1. Just remove the optimization fence and let people add OFFSET 0 to their queries if they want an optimization fence. This lets us keep pretending that we do not have query hints (and therefore do not have to formalize any syntax for them) while still allowing people to add optimization fences. 2. Add a decorator for WITH (e.g. "WITH MATERIALIZED x (...") to add an explicit optimization fence. This will for the first time add official support for a query hint in the syntax which is a quite big precedent. 3. Add a new GUC which can enable and disable the optimization fence. This is a very clumsy tool, but maybe good enough for some users and some people here in this thread have complained about our similar GUCs. 4. Add some new more generic query hinting facility. This is a lot of work and something which would be very hard to get consensus for. Andreas
On Mon, May 1, 2017 at 7:22 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > > > 2017-05-01 1:21 GMT+02:00 Andres Freund <andres@anarazel.de>: >> >> On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote: >> > why we cannot to introduce GUC option - enable_cteoptfence ? >> >> Doesn't really solve the issue, and we've generally shied away from GUCs >> that influence behaviour after a few bad experiences. What if you want >> one CTE inlined, but another one not? > > > It change behave in same sense like enable_nestloop, enable_hashjoin, ... > with same limits. And then we recall plan hints :) > > Regards > > Pavel > >> >> >> - Andres > >
On Sun, Apr 30, 2017 at 6:21 PM, Andres Freund <andres@anarazel.de> wrote: > On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote: >> why we cannot to introduce GUC option - enable_cteoptfence ? > > Doesn't really solve the issue, and we've generally shied away from GUCs > that influence behaviour after a few bad experiences. What if you want > one CTE inlined, but another one not? Yeah. Are we absolutely opposed to SQL syntax against WITH that allows or disallows fencing? for example, WITH [MATERIALIZED] Pushing people to OFFSET 0 is a giant step backwards IMO, and as in implementation detail is also subject to change. merlin
On 05/02/2017 10:13 AM, Merlin Moncure wrote: > On Sun, Apr 30, 2017 at 6:21 PM, Andres Freund <andres@anarazel.de> wrote: >> On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote: >>> why we cannot to introduce GUC option - enable_cteoptfence ? >> Doesn't really solve the issue, and we've generally shied away from GUCs >> that influence behaviour after a few bad experiences. What if you want >> one CTE inlined, but another one not? > Yeah. Are we absolutely opposed to SQL syntax against WITH that > allows or disallows fencing? for example, > > WITH [MATERIALIZED] > > Pushing people to OFFSET 0 is a giant step backwards IMO, and as in > implementation detail is also subject to change. > > Agreed, it's an ugly as sin and completely non-obvious hack. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 5/2/17 4:44 PM, Andrew Dunstan wrote: > > > On 05/02/2017 10:13 AM, Merlin Moncure wrote: >> On Sun, Apr 30, 2017 at 6:21 PM, Andres Freund <andres@anarazel.de> wrote: >>> On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote: >>>> why we cannot to introduce GUC option - enable_cteoptfence ? >>> Doesn't really solve the issue, and we've generally shied away from GUCs >>> that influence behaviour after a few bad experiences. What if you want >>> one CTE inlined, but another one not? >> Yeah. Are we absolutely opposed to SQL syntax against WITH that >> allows or disallows fencing? for example, >> >> WITH [MATERIALIZED] >> >> Pushing people to OFFSET 0 is a giant step backwards IMO, and as in >> implementation detail is also subject to change. >> >> > > Agreed, it's an ugly as sin and completely non-obvious hack. > Isn't OFFSET 0 an implementation detail anyway? Who says the planner couldn't get smarter in the future, realize OFFSET 0 is no-op? In that case replacing CTE optimization fence with "OFFSET 0" would be akin to painting yourself into a corner, waiting for the pain to dry, walking over to another corner and painting yourself into that one. cheers -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> Relevant posts where users get confused by our behaviour: > .... And Markus Winand's blog: http://modern-sql.com/feature/with/performance Databases generally obey this principle, although PostgreSQL represents a big exception and Besides PostgreSQL, all tested databases optimize with clauses in the same way that they optimize views or derived tables" -- View this message in context: http://www.postgresql-archive.org/CTE-inlining-tp5958992p5959313.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On Tue, May 02, 2017 at 02:40:55PM +0200, Andreas Karlsson wrote: > On 05/02/2017 04:38 AM, Craig Ringer wrote: > > On 1 May 2017 at 22:26, Andreas Karlsson <andreas@proxel.se> wrote: > > > I am not sure I like decorators since this means adding an ad hoc query hint > > > directly into the SQL syntax which is something which I requires serious > > > consideration. > > > > And mangling the semantics of existing syntax doesn't? > > > > That's what we do right now so we can pretend we don't have query > > hints while still having query hints. > > I am in favor of removing the optimization fence from CTEs, and strongly > prefer no fence being the default behavior since SQL is a declarative > language and I think it is reasonable to assume that CTEs can be inlined. > But the question is how to best remove the fence while taking into account > that quite many use them as optimization fences today. > > I see some alternatives, none of them perfect. > > 1. Just remove the optimization fence and let people add OFFSET 0 to their > queries if they want an optimization fence. This lets us keep pretending > that we do not have query hints (and therefore do not have to formalize any > syntax for them) while still allowing people to add optimization fences. +1 I get that people with gigantic PostgreSQL installations with stringent performance requirements sometimes need to do odd things to squeeze out the last few percentage points of performance. As the people (well, at least the people close to the ground) at these organizations are fully aware, performance optimizations are extremely volatile with respect to new versions of software, whether it's PostgreSQL, Oracle, the Linux kernel, or what have you. They expect this, and they have processes in place to handle it. If they don't, it's pilot error. We should not be penalizing all our other users to maintain the fiction that people can treat performance optimizations as a "fire and forget" matter. > 2. Add a decorator for WITH (e.g. "WITH MATERIALIZED x (...") to add an > explicit optimization fence. This will for the first time add official > support for a query hint in the syntax which is a quite big precedent. Yep. It's one we should think very carefully before we introduce. > 3. Add a new GUC which can enable and disable the optimization fence. This > is a very clumsy tool, but maybe good enough for some users and some people > here in this thread have complained about our similar GUCs. Any GUC would be unable to distinguish one WITH clause from another. The hammer would then be guaranteed to be too big for precisely the cases where it's most needed. > 4. Add some new more generic query hinting facility. This is a lot > of work and something which would be very hard to get consensus for. Just the design of the thing would be the work of months at a minimum, assuming we got to some consensus at all. Maybe it's worth doing. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
I get that people with gigantic PostgreSQL installations with
stringent performance requirements sometimes need to do odd things to
squeeze out the last few percentage points of performance. As the
people (well, at least the people close to the ground) at these
organizations are fully aware, performance optimizations are extremely
volatile with respect to new versions of software, whether it's
PostgreSQL, Oracle, the Linux kernel, or what have you. They expect
this, and they have processes in place to handle it. If they don't,
it's pilot error.
Well put. People on the ground in those situations go to great lengths to freeze the query plan as-is. For them, an upgrade is something that is done after months of planning. They might be surprised by the dropping of this optimization fence, but the surprise won't be in production, and they've got just as good of chance of being pleasantly surprised.
> 2. Add a decorator for WITH (e.g. "WITH MATERIALIZED x (...") to add an
> explicit optimization fence. This will for the first time add official
> support for a query hint in the syntax which is a quite big precedent.
Yep. It's one we should think very carefully before we introduce.
There's a tiny, oblique precedence for this with Oracle's WITH [FUNCTION | PROCEDURE] syntax. In both cases, the user is able to create an ephemeral object that can be referenced later in the query. Good idea or bad, it's a sign that others have been fine with that conceptual path.
Personally, I'm fine with WITH MATERIALIZED, but I'm also ok with just not promising the fence. I think there is value in letting users break up a complex query into understandable WITH-chunks, and that value shouldn't prevent good performance. The fence will probably still be there anyway in the case of INSERT/UPDATE RETURNING and cases where a CTE is referenced more than once in the query that follows.
On 5/2/17 6:34 PM, David Fetter wrote: > On Tue, May 02, 2017 at 02:40:55PM +0200, Andreas Karlsson wrote: >> On 05/02/2017 04:38 AM, Craig Ringer wrote: >>> On 1 May 2017 at 22:26, Andreas Karlsson <andreas@proxel.se> wrote:>> >> ... >> >> I see some alternatives, none of them perfect. >> >> 1. Just remove the optimization fence and let people add OFFSET 0 to their >> queries if they want an optimization fence. This lets us keep pretending >> that we do not have query hints (and therefore do not have to formalize any >> syntax for them) while still allowing people to add optimization fences. > > +1 > > I get that people with gigantic PostgreSQL installations with > stringent performance requirements sometimes need to do odd things to > squeeze out the last few percentage points of performance. As the > people (well, at least the people close to the ground) at these > organizations are fully aware, performance optimizations are extremely > volatile with respect to new versions of software, whether it's > PostgreSQL, Oracle, the Linux kernel, or what have you. They expect > this, and they have processes in place to handle it. If they don't, > it's pilot error. > > We should not be penalizing all our other users to maintain the > fiction that people can treat performance optimizations as a "fire and > forget" matter. > Agreed. >> 2. Add a decorator for WITH (e.g. "WITH MATERIALIZED x (...") to add an >> explicit optimization fence. This will for the first time add official >> support for a query hint in the syntax which is a quite big precedent. > > Yep. It's one we should think very carefully before we introduce. > I think it's a mistake to see this as an introduction of query hits. Firstly, it's a question whether it qualifies as a hint. I wouldn't call it a hint, but let's assume there is a definition of query hints that includes WITH MATERIALIZED. More importantly, however, this is not introducing anything new. It's just a different name for the current "WITH" semantics, and you can achieve the same behavior by "OFFSET 0". And people are already using these as hints, so I fail to see how this introduces anything new. In fact, if you see the optimization fence as an implicit query hint, this actually *removes* a hint (although most users are unaware of that behavior and use it unintentionally). >> 3. Add a new GUC which can enable and disable the optimization fence. This >> is a very clumsy tool, but maybe good enough for some users and some people >> here in this thread have complained about our similar GUCs. > > Any GUC would be unable to distinguish one WITH clause from another. > The hammer would then be guaranteed to be too big for precisely the > cases where it's most needed. > If I could, I'd give -1 million to a GUC-based approach, as that would make it entirely unusable in practice, I think. Actually, I can give -1 million, so I'm giving it. >> >> 4. Add some new more generic query hinting facility. This is a lot >> of work and something which would be very hard to get consensus for. > > Just the design of the thing would be the work of months at a minimum, > assuming we got to some consensus at all. Maybe it's worth doing. > While I came to conclusion that query hints may be quite useful in some situations, I'm pretty sure this is not a battle you'd like to fight. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
\On Tue, May 2, 2017 at 12:05 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > On 5/2/17 6:34 PM, David Fetter wrote: >> >> On Tue, May 02, 2017 at 02:40:55PM +0200, Andreas Karlsson wrote: >>> >>> On 05/02/2017 04:38 AM, Craig Ringer wrote: >>>> >>>> On 1 May 2017 at 22:26, Andreas Karlsson <andreas@proxel.se> wrote: > >>> >>> >>> ... >>> >>> I see some alternatives, none of them perfect. >>> >>> 1. Just remove the optimization fence and let people add OFFSET 0 to >>> their >>> queries if they want an optimization fence. This lets us keep pretending >>> that we do not have query hints (and therefore do not have to formalize >>> any >>> syntax for them) while still allowing people to add optimization fences. >> >> >> +1 >> >> I get that people with gigantic PostgreSQL installations with >> stringent performance requirements sometimes need to do odd things to >> squeeze out the last few percentage points of performance. As the >> people (well, at least the people close to the ground) at these >> organizations are fully aware, performance optimizations are extremely >> volatile with respect to new versions of software, whether it's >> PostgreSQL, Oracle, the Linux kernel, or what have you. They expect >> this, and they have processes in place to handle it. If they don't, >> it's pilot error. >> >> We should not be penalizing all our other users to maintain the >> fiction that people can treat performance optimizations as a "fire and >> forget" matter. >> > > Agreed. > >>> 2. Add a decorator for WITH (e.g. "WITH MATERIALIZED x (...") to add an >>> explicit optimization fence. This will for the first time add official >>> support for a query hint in the syntax which is a quite big precedent. >> >> >> Yep. It's one we should think very carefully before we introduce. >> > > I think it's a mistake to see this as an introduction of query hits. > > Firstly, it's a question whether it qualifies as a hint. I wouldn't call it > a hint, but let's assume there is a definition of query hints that includes > WITH MATERIALIZED. > > More importantly, however, this is not introducing anything new. It's just a > different name for the current "WITH" semantics, and you can achieve the > same behavior by "OFFSET 0". And people are already using these as hints, so > I fail to see how this introduces anything new. > > In fact, if you see the optimization fence as an implicit query hint, this > actually *removes* a hint (although most users are unaware of that behavior > and use it unintentionally). +1 down the line. More to the point, for several years now we've (or at least I, but I'm not the only one) have been advocating for the usage of CTE to avoid the undocumented and bizarre OFFSET 0 trick. Jerking this out from users without giving a simple mechanic to get the same behavior minus a major query rewrite is blatantly user hostile. I can't believe we're even contemplating it. Also a GUC is not a solution for pretty obvious reasons I think. merlin
On 5/2/17 11:23 PM, Merlin Moncure wrote: > \On Tue, May 2, 2017 at 12:05 PM, Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> On 5/2/17 6:34 PM, David Fetter wrote: >>> >>> On Tue, May 02, 2017 at 02:40:55PM +0200, Andreas Karlsson wrote: >>>> >>>> On 05/02/2017 04:38 AM, Craig Ringer wrote: >>>>> >>>>> On 1 May 2017 at 22:26, Andreas Karlsson <andreas@proxel.se> wrote: >> >>>> >>>> >>>> ... >>>> >>>> I see some alternatives, none of them perfect. >>>> >>>> 1. Just remove the optimization fence and let people add OFFSET 0 to >>>> their >>>> queries if they want an optimization fence. This lets us keep pretending >>>> that we do not have query hints (and therefore do not have to formalize >>>> any >>>> syntax for them) while still allowing people to add optimization fences. >>> >>> >>> +1 >>> >>> I get that people with gigantic PostgreSQL installations with >>> stringent performance requirements sometimes need to do odd things to >>> squeeze out the last few percentage points of performance. As the >>> people (well, at least the people close to the ground) at these >>> organizations are fully aware, performance optimizations are extremely >>> volatile with respect to new versions of software, whether it's >>> PostgreSQL, Oracle, the Linux kernel, or what have you. They expect >>> this, and they have processes in place to handle it. If they don't, >>> it's pilot error. >>> >>> We should not be penalizing all our other users to maintain the >>> fiction that people can treat performance optimizations as a "fire and >>> forget" matter. >>> >> >> Agreed. >> >>>> 2. Add a decorator for WITH (e.g. "WITH MATERIALIZED x (...") to add an >>>> explicit optimization fence. This will for the first time add official >>>> support for a query hint in the syntax which is a quite big precedent. >>> >>> >>> Yep. It's one we should think very carefully before we introduce. >>> >> >> I think it's a mistake to see this as an introduction of query hits. >> >> Firstly, it's a question whether it qualifies as a hint. I wouldn't call it >> a hint, but let's assume there is a definition of query hints that includes >> WITH MATERIALIZED. >> >> More importantly, however, this is not introducing anything new. It's just a >> different name for the current "WITH" semantics, and you can achieve the >> same behavior by "OFFSET 0". And people are already using these as hints, so >> I fail to see how this introduces anything new. >> >> In fact, if you see the optimization fence as an implicit query hint, this >> actually *removes* a hint (although most users are unaware of that behavior >> and use it unintentionally). > > +1 down the line. More to the point, for several years now we've (or > at least I, but I'm not the only one) have been advocating for the > usage of CTE to avoid the undocumented and bizarre OFFSET 0 trick. > Jerking this out from users without giving a simple mechanic to get > the same behavior minus a major query rewrite is blatantly user > hostile. I can't believe we're even contemplating it. Also a GUC is > not a solution for pretty obvious reasons I think. > I'm not sure what you mean by "jerking this out from users". Isn't most of this thread about how to allow CTE inlining without hurting users unnecessarily? I think we agree that: * Just removing the optimization fence and telling users to use OFFSET 0 instead is a no-go, just like removing the fencing and not providing any sensible replacement. * GUC is not the solution. Which leaves us with either WITH INLINE or WITH MATERIALIZE, or something along those lines. If we go with WITH INLINE then we're likely not solving anything, because most people will simply use WITH just like now, and will be subject to the fencing without realizing it. Or we will choose WITH MATERIALIZE, and then the users aware of the fencing (and using the CTEs for that purpose) will have to modify the queries. But does adding MATERIALIZE quality as major query rewrite? Perhaps combining this with a GUC would be a solution. I mean, a GUC specifying the default behavior, and then INLINE / MATERIALIZE for individual CTEs in a query? If you have an application intentionally using CTEs as a fence, just do ALTER DATABASE x SET enable_guc_fencing = on and you don't have to rewrite the queries. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3 May 2017 at 07:00, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > I'm not sure what you mean by "jerking this out from users". Isn't most of > this thread about how to allow CTE inlining without hurting users > unnecessarily? He's referring to Andreas Karlsson <andreas@proxel.se> wrote: > 1. Just remove the optimization fence and let people add OFFSET 0 to their queries if they want an optimization fence.This lets us keep pretending that we do not have query hints (and therefore do not have to formalize any syntax forthem) while still allowing people to add optimization fences. ... and the various agreement expressed with it. I wish that's what had been done in the first place, but it's a bit harsh on users to do it now. We can keep living in fantasty-land where we deny hints while telling people to use a washy-worded semi-documented hint to work around issues. Or we can take responsibility and admit we've had to grandfather a limitation in as a blessed hint. Then unblock our way toward this performance enhancement / user foot-gun without yanking the rug out from under users who've been relying on our odd implementation detail to date: * Add "WITH MATERIALIZED" or "WITH NO_INLINE" or whatever * Change "WITH" to inline where safe, initially only for single reference terms * Mention the change in the release notes Note that I'm in no way advocating real hints. I won't say I love all aspects of project policy there, but it's clear there are benefits, and that hints bring plenty of problems. If this weren't already documented and widely advocated as a way to "fix" poor pushdown/pullup/inlining decisions I'd be firmly in the "just fix it" camp. > * Just removing the optimization fence and telling users to use OFFSET 0 > instead is a no-go, just like removing the fencing and not providing any > sensible replacement. I could tolerate telling people to use OFFSET 0 (and documenting it!) as a workaround if we can't get something more friendly in. At least then we're back down to one hint-not-a-hint instead of two, and it doesn't really block any useful optimisations. > * GUC is not the solution. Strong agreement. > If we go with WITH INLINE then we're likely not solving anything, because > most people will simply use WITH just like now, and will be subject to the > fencing without realizing it. Yes, and we're missing the opportunity to confirm with what other systems do, and the spirit of the SQL language's declare what I want, not how to do it, model. > Or we will choose WITH MATERIALIZE, and then the users aware of the fencing > (and using the CTEs for that purpose) will have to modify the queries. But > does adding MATERIALIZE quality as major query rewrite? Hardly. > Perhaps combining this with a GUC would be a solution. I mean, a GUC > specifying the default behavior, and then INLINE / MATERIALIZE for > individual CTEs in a query? It'd be nice if we could do that for a couple of releases as an interim measure, but people will then get locked into relying on it, and we'll never be able to remove it. It's not like we don't make users do other things for upgrades, and don't change performance in other ways. We don't even HAVE a performance-test farm to look at regressions, planner behaviour changes, etc. Yes, I know it's hard and nobody's volunteering, the point is, we're hardly free of undocumented and unintentional query-specific regressions let alone documented and relnoted ones. So a sad -1 to me for a GUC. Anyone big enough to be significantly upset by this planner change will have a QA/staging deployment system anyway. Or should, because we make enough other changes in a major release to make their life way more interesting than this! -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 05/02/2017 07:00 PM, Tomas Vondra wrote: > > > > I think we agree that: > > * Just removing the optimization fence and telling users to use OFFSET > 0 instead is a no-go, just like removing the fencing and not providing > any sensible replacement. > > * GUC is not the solution. yes > > Which leaves us with either WITH INLINE or WITH MATERIALIZE, or > something along those lines. > > If we go with WITH INLINE then we're likely not solving anything, > because most people will simply use WITH just like now, and will be > subject to the fencing without realizing it. In many cases it won't actually matter much. We're going to penalize some group of users no matter what we do. It just seems a pity that it might be the group who actually took us at our word. It's no skin off my nose - I will happily spend time finding places where this will make things worse for customers. > > Or we will choose WITH MATERIALIZE, and then the users aware of the > fencing (and using the CTEs for that purpose) will have to modify the > queries. But does adding MATERIALIZE quality as major query rewrite? It's not a major rewrite. But I can think of at least one former customer who will have to go through a heck of a lot of code finding where to add that one word. > > Perhaps combining this with a GUC would be a solution. I mean, a GUC > specifying the default behavior, and then INLINE / MATERIALIZE for > individual CTEs in a query? > > If you have an application intentionally using CTEs as a fence, just do > > ALTER DATABASE x SET enable_guc_fencing = on > > and you don't have to rewrite the queries. > ITYM enable_cte_fencing. I'm not sure it will help all that much unless we provide both decorator variants, so people can remediate their code one query at a time, and this guc would just govern the default. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, May 3, 2017 at 2:19 AM, Craig Ringer <craig@2ndquadrant.com> wrote: >> Or we will choose WITH MATERIALIZE, and then the users aware of the fencing >> (and using the CTEs for that purpose) will have to modify the queries. But >> does adding MATERIALIZE quality as major query rewrite? > > Hardly. > >> Perhaps combining this with a GUC would be a solution. I mean, a GUC >> specifying the default behavior, and then INLINE / MATERIALIZE for >> individual CTEs in a query? > > It'd be nice if we could do that for a couple of releases as an > interim measure, but people will then get locked into relying on it, > and we'll never be able to remove it. The proposed guc seems like a good idea, without which ORMs that support CTEs would be at a loss. People using those ORMs that need materialized behavior would have to wait for the ORM to catch up with postgres syntax before upgrading, and that wouldn't be a nice thing. It's not about requiring testing before upgrading, of course users should/will do that. But if said testing says inlined CTEs perform horribly, and the ORM has no support for the materialized keyword, the only option is to not upgrade. With the CTE, people can upgrade, changing the default behavior back to what it was. That seems to me a useful thing to have.
On Wed, May 03, 2017 at 11:26:27AM -0300, Claudio Freire wrote: > On Wed, May 3, 2017 at 2:19 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > >> Or we will choose WITH MATERIALIZE, and then the users aware of > >> the fencing (and using the CTEs for that purpose) will have to > >> modify the queries. But does adding MATERIALIZE quality as major > >> query rewrite? > > > > Hardly. > > > >> Perhaps combining this with a GUC would be a solution. I mean, a > >> GUC specifying the default behavior, and then INLINE / > >> MATERIALIZE for individual CTEs in a query? > > > > It'd be nice if we could do that for a couple of releases as an > > interim measure, but people will then get locked into relying on > > it, and we'll never be able to remove it. > > The proposed guc seems like a good idea, without which ORMs that > support CTEs would be at a loss. Are you aware of such an ORM which both supports WITH and doesn't also closely track PostgreSQL development? I'm not. Even assuming that such a thing exists, it's not at all obvious to me that we should be stalling and/or putting in what will turn out to be misfeatures to accommodate it. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
> I could tolerate telling people to use OFFSET 0 (and documenting it!) > as a workaround if we can't get something more friendly in. I agree with that. > If we go with WITH INLINE then we're likely not solving anything, because > most people will simply use WITH just like now, and will be subject to the > fencing without realizing it. I agree - the default behaviour should be change to match what everybody expects. The current behaviour should be the exception. > Yes, and we're missing the opportunity to confirm with what other > systems do, and the spirit of the SQL language's declare what I want, > not how to do it, model. Essentially *all* other systems optimize CTEs the same way they optimize derived tables. I think even MySQL does it like that in the upcoming 8.0 release. I have never heard anyone saying that the Postgres implementation is an advantage and that they would hate to see this disappear. I usually hear "Why is Postgres doing that? Can't they change that?" Maybe I have a limited view on this, but from where I stand, simply changing it would help everybody I know and would not break anything. I don't even think a replacement for the old behaviour would be necessary. -- View this message in context: http://www.postgresql-archive.org/CTE-inlining-tp5958992p5959509.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On Wed, May 3, 2017 at 11:31 AM, David Fetter <david@fetter.org> wrote: > On Wed, May 03, 2017 at 11:26:27AM -0300, Claudio Freire wrote: >> On Wed, May 3, 2017 at 2:19 AM, Craig Ringer <craig@2ndquadrant.com> wrote: >> >> Or we will choose WITH MATERIALIZE, and then the users aware of >> >> the fencing (and using the CTEs for that purpose) will have to >> >> modify the queries. But does adding MATERIALIZE quality as major >> >> query rewrite? >> > >> > Hardly. >> > >> >> Perhaps combining this with a GUC would be a solution. I mean, a >> >> GUC specifying the default behavior, and then INLINE / >> >> MATERIALIZE for individual CTEs in a query? >> > >> > It'd be nice if we could do that for a couple of releases as an >> > interim measure, but people will then get locked into relying on >> > it, and we'll never be able to remove it. >> >> The proposed guc seems like a good idea, without which ORMs that >> support CTEs would be at a loss. > > Are you aware of such an ORM which both supports WITH and doesn't also > closely track PostgreSQL development? I'm not. > > Even assuming that such a thing exists, it's not at all obvious to me > that we should be stalling and/or putting in what will turn out to be > misfeatures to accommodate it. I know SQLAlchemy does support CTEs, and lags quite considerably in its support of the latest syntactic elements. For instance, it took them 8 months to support the "skip locked" option. Not sure whether that qualifies as "closely tracking" postgres for you. Clearly they do track it, but that doesn't mean they're fast or as fast as one would like/need. Sure, that might not be enough to warrant the GUC. I would think so, those are my 2 cents. YMMV.
On Wed, May 03, 2017 at 01:27:38PM -0300, Claudio Freire wrote: > On Wed, May 3, 2017 at 11:31 AM, David Fetter <david@fetter.org> wrote: > > Are you aware of such an ORM which both supports WITH and doesn't > > also closely track PostgreSQL development? I'm not. > > > > Even assuming that such a thing exists, it's not at all obvious to > > me that we should be stalling and/or putting in what will turn out > > to be misfeatures to accommodate it. > > I know SQLAlchemy does support CTEs, and lags quite considerably in > its support of the latest syntactic elements. > > For instance, it took them 8 months to support the "skip locked" > option. That is pretty strictly their problem. > Not sure whether that qualifies as "closely tracking" postgres for > you. Clearly they do track it, but that doesn't mean they're fast or > as fast as one would like/need. We can NOT make their tardiness a driver of our development. > Sure, that might not be enough to warrant the GUC. I would think so, > those are my 2 cents. YMMV. When we add a "temporary" GUC, we're taking on a gigantic burden. Either we support it forever somehow, or we put it on a deprecation schedule immediately and expect to be answering questions about it for years after it's been removed. -1 for the GUC. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
2017-05-03 18:54 GMT+02:00 David Fetter <david@fetter.org>:
On Wed, May 03, 2017 at 01:27:38PM -0300, Claudio Freire wrote:
> On Wed, May 3, 2017 at 11:31 AM, David Fetter <david@fetter.org> wrote:
> > Are you aware of such an ORM which both supports WITH and doesn't
> > also closely track PostgreSQL development? I'm not.
> >
> > Even assuming that such a thing exists, it's not at all obvious to
> > me that we should be stalling and/or putting in what will turn out
> > to be misfeatures to accommodate it.
>
> I know SQLAlchemy does support CTEs, and lags quite considerably in
> its support of the latest syntactic elements.
>
> For instance, it took them 8 months to support the "skip locked"
> option.
That is pretty strictly their problem.
> Not sure whether that qualifies as "closely tracking" postgres for
> you. Clearly they do track it, but that doesn't mean they're fast or
> as fast as one would like/need.
We can NOT make their tardiness a driver of our development.
> Sure, that might not be enough to warrant the GUC. I would think so,
> those are my 2 cents. YMMV.
When we add a "temporary" GUC, we're taking on a gigantic burden.
Either we support it forever somehow, or we put it on a deprecation
schedule immediately and expect to be answering questions about it for
years after it's been removed.
-1 for the GUC.
Is possible to find consensus without GUC? I understand well, why GUC is wrong, but I don't see any possible solution how to change current behave and don't break lot of applications.
Regards
Pavel
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
David Fetter wrote: > When we add a "temporary" GUC, we're taking on a gigantic burden. > Either we support it forever somehow, or we put it on a deprecation > schedule immediately and expect to be answering questions about it for > years after it's been removed. > > -1 for the GUC. Absolutely. So ISTM we have three choices: 1) we switch unmarked CTEs as inlineable by default in pg11. What seems likely to happen for a user that upgrades to pg11 is that 5 out of 10 CTE-using queries are going to become faster than with pg10, and they are going to be happy; 4 out of five are going to see no difference, but they didn't have to do anything about it; and the remaining query is going to become slower, either indistinguishably so (in which case they don't care and they remain happy because of the other improvements) or notably so, in which case they can easily figure where to add the MATERIALIZED option and regain the original performance. 2) unmarked CTEs continue to be an optimization barrier, but we add "WITH INLINED" so that they're inlineable. Some users may wonder about it and waste a lot of time trying to figure out which CTEs to add it to. They see a benefit in half the queries, which makes them happy, but they are angry that they had to waste all that time on the other queries. 3) We don't do anything, because we all agree that GUCs are not suitable. No progress. No anger, but nobody is happy either. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2017-05-03 19:33 GMT+02:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
David Fetter wrote:
> When we add a "temporary" GUC, we're taking on a gigantic burden.
> Either we support it forever somehow, or we put it on a deprecation
> schedule immediately and expect to be answering questions about it for
> years after it's been removed.
>
> -1 for the GUC.
Absolutely.
So ISTM we have three choices:
1) we switch unmarked CTEs as inlineable by default in pg11. What seems
likely to happen for a user that upgrades to pg11 is that 5 out of 10
CTE-using queries are going to become faster than with pg10, and they
are going to be happy; 4 out of five are going to see no difference, but
they didn't have to do anything about it; and the remaining query is
going to become slower, either indistinguishably so (in which case they
don't care and they remain happy because of the other improvements) or
notably so, in which case they can easily figure where to add the
MATERIALIZED option and regain the original performance.
2) unmarked CTEs continue to be an optimization barrier, but we add
"WITH INLINED" so that they're inlineable. Some users may wonder about
it and waste a lot of time trying to figure out which CTEs to add it to.
They see a benefit in half the queries, which makes them happy, but they
are angry that they had to waste all that time on the other queries.
3) We don't do anything, because we all agree that GUCs are not
suitable. No progress. No anger, but nobody is happy either.
yes, these variants are all.
@2 is only little bit better than bad @3 - is not nice to have default behave different than any other and than what 90% developers expects.
Regards
Pavel
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, May 03, 2017 at 02:33:05PM -0300, Alvaro Herrera wrote: > David Fetter wrote: > > > When we add a "temporary" GUC, we're taking on a gigantic burden. > > Either we support it forever somehow, or we put it on a deprecation > > schedule immediately and expect to be answering questions about it for > > years after it's been removed. > > > > -1 for the GUC. > > Absolutely. > > So ISTM we have three choices: > > 1) we switch unmarked CTEs as inlineable by default in pg11. What seems > likely to happen for a user that upgrades to pg11 is that 5 out of 10 > CTE-using queries are going to become faster than with pg10, and they > are going to be happy; 4 out of five are going to see no difference, but > they didn't have to do anything about it; and the remaining query is > going to become slower, either indistinguishably so (in which case they > don't care and they remain happy because of the other improvements) or > notably so, in which case they can easily figure where to add the > MATERIALIZED option and regain the original performance. > > > 2) unmarked CTEs continue to be an optimization barrier, but we add > "WITH INLINED" so that they're inlineable. Some users may wonder about > it and waste a lot of time trying to figure out which CTEs to add it to. > They see a benefit in half the queries, which makes them happy, but they > are angry that they had to waste all that time on the other queries. > > > 3) We don't do anything, because we all agree that GUCs are not > suitable. No progress. No anger, but nobody is happy either. > +1 for option 1. I just finished rewriting a well written CTE query to avoid the optimization fence and get reasonable performance. Regards, Ken
On 05/03/2017 07:33 PM, Alvaro Herrera wrote: > 1) we switch unmarked CTEs as inlineable by default in pg11. What seems > likely to happen for a user that upgrades to pg11 is that 5 out of 10 > CTE-using queries are going to become faster than with pg10, and they > are going to be happy; 4 out of five are going to see no difference, but > they didn't have to do anything about it; and the remaining query is > going to become slower, either indistinguishably so (in which case they > don't care and they remain happy because of the other improvements) or > notably so, in which case they can easily figure where to add the > MATERIALIZED option and regain the original performance. > > > 2) unmarked CTEs continue to be an optimization barrier, but we add > "WITH INLINED" so that they're inlineable. Some users may wonder about > it and waste a lot of time trying to figure out which CTEs to add it to. > They see a benefit in half the queries, which makes them happy, but they > are angry that they had to waste all that time on the other queries. > > > 3) We don't do anything, because we all agree that GUCs are not > suitable. No progress. No anger, but nobody is happy either. +1 for option 1. And while I would not like if we had to combine it with a backwards compatibility GUC which enables the old behavior to get it merged I still personally would prefer that over option 2 and 3. Andreas
On 5/3/17 9:54 PM, Andreas Karlsson wrote: > On 05/03/2017 07:33 PM, Alvaro Herrera wrote: >> 1) we switch unmarked CTEs as inlineable by default in pg11. What seems >> likely to happen for a user that upgrades to pg11 is that 5 out of 10 >> CTE-using queries are going to become faster than with pg10, and they >> are going to be happy; 4 out of five are going to see no difference, but >> they didn't have to do anything about it; and the remaining query is >> going to become slower, either indistinguishably so (in which case they >> don't care and they remain happy because of the other improvements) or >> notably so, in which case they can easily figure where to add the >> MATERIALIZED option and regain the original performance. >> >> >> 2) unmarked CTEs continue to be an optimization barrier, but we add >> "WITH INLINED" so that they're inlineable. Some users may wonder about >> it and waste a lot of time trying to figure out which CTEs to add it to. >> They see a benefit in half the queries, which makes them happy, but they >> are angry that they had to waste all that time on the other queries. >> >> >> 3) We don't do anything, because we all agree that GUCs are not >> suitable. No progress. No anger, but nobody is happy either. > > +1 for option 1. And while I would not like if we had to combine it with > a backwards compatibility GUC which enables the old behavior to get it > merged I still personally would prefer that over option 2 and 3. > > Andreas> +1 to what Andreas says -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, May 3, 2017 at 12:33 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > David Fetter wrote: > >> When we add a "temporary" GUC, we're taking on a gigantic burden. >> Either we support it forever somehow, or we put it on a deprecation >> schedule immediately and expect to be answering questions about it for >> years after it's been removed. >> >> -1 for the GUC. > > Absolutely. > > So ISTM we have three choices: > > 1) we switch unmarked CTEs as inlineable by default in pg11. What seems > likely to happen for a user that upgrades to pg11 is that 5 out of 10 > CTE-using queries are going to become faster than with pg10, and they > are going to be happy; 4 out of five are going to see no difference, but > they didn't have to do anything about it; and the remaining query is > going to become slower, either indistinguishably so (in which case they > don't care and they remain happy because of the other improvements) or > notably so, in which case they can easily figure where to add the > MATERIALIZED option and regain the original performance. +1 for option 1. This change will be welcome for a large number of queries, but forced materialization is a real need and I use it often. This comes off as a very reasonable compromise in my opinion unless it requires major coding gymnastics to implement. merlin
On 04/05/17 05:33, Alvaro Herrera wrote: > David Fetter wrote: > >> When we add a "temporary" GUC, we're taking on a gigantic burden. >> Either we support it forever somehow, or we put it on a deprecation >> schedule immediately and expect to be answering questions about it for >> years after it's been removed. >> >> -1 for the GUC. > Absolutely. > > So ISTM we have three choices: > > 1) we switch unmarked CTEs as inlineable by default in pg11. What seems > likely to happen for a user that upgrades to pg11 is that 5 out of 10 > CTE-using queries are going to become faster than with pg10, and they > are going to be happy; 4 out of five are going to see no difference, but > they didn't have to do anything about it; and the remaining query is > going to become slower, either indistinguishably so (in which case they > don't care and they remain happy because of the other improvements) or > notably so, in which case they can easily figure where to add the > MATERIALIZED option and regain the original performance. +1 I've watched a colleague spend hours trying to optimise a complicated query with nested views, then find that this 'optimisation fence' was the heart of the problem. Cheers, Gavin
On 05/03/2017 05:24 PM, Merlin Moncure wrote: > On Wed, May 3, 2017 at 12:33 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: >> David Fetter wrote: >> >>> When we add a "temporary" GUC, we're taking on a gigantic burden. >>> Either we support it forever somehow, or we put it on a deprecation >>> schedule immediately and expect to be answering questions about it for >>> years after it's been removed. >>> >>> -1 for the GUC. >> Absolutely. >> >> So ISTM we have three choices: >> >> 1) we switch unmarked CTEs as inlineable by default in pg11. What seems >> likely to happen for a user that upgrades to pg11 is that 5 out of 10 >> CTE-using queries are going to become faster than with pg10, and they >> are going to be happy; 4 out of five are going to see no difference, but >> they didn't have to do anything about it; and the remaining query is >> going to become slower, either indistinguishably so (in which case they >> don't care and they remain happy because of the other improvements) or >> notably so, in which case they can easily figure where to add the >> MATERIALIZED option and regain the original performance. > +1 for option 1. This change will be welcome for a large number of > queries, but forced materialization is a real need and I use it often. > This comes off as a very reasonable compromise in my opinion unless it > requires major coding gymnastics to implement. > The only thing I am totally dead set against is making people go back to using OFFSET 0. It's ugly and completely non-intuitive. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 03/05/17 23:24, Merlin Moncure wrote: > On Wed, May 3, 2017 at 12:33 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: >> David Fetter wrote: >> >>> When we add a "temporary" GUC, we're taking on a gigantic burden. >>> Either we support it forever somehow, or we put it on a deprecation >>> schedule immediately and expect to be answering questions about it for >>> years after it's been removed. >>> >>> -1 for the GUC. >> >> Absolutely. >> >> So ISTM we have three choices: >> >> 1) we switch unmarked CTEs as inlineable by default in pg11. What seems >> likely to happen for a user that upgrades to pg11 is that 5 out of 10 >> CTE-using queries are going to become faster than with pg10, and they >> are going to be happy; 4 out of five are going to see no difference, but >> they didn't have to do anything about it; and the remaining query is >> going to become slower, either indistinguishably so (in which case they >> don't care and they remain happy because of the other improvements) or >> notably so, in which case they can easily figure where to add the >> MATERIALIZED option and regain the original performance. > > +1 for option 1. This change will be welcome for a large number of > queries, but forced materialization is a real need and I use it often. > This comes off as a very reasonable compromise in my opinion unless it > requires major coding gymnastics to implement. > +1 to this -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
> 1) we switch unmarked CTEs as inlineable by default in pg11. +1 from me for option 1 -- View this message in context: http://www.postgresql-archive.org/CTE-inlining-tp5958992p5959615.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On 04/05/17 05:33, Alvaro Herrera wrote:> David Fetter wrote:>>> When we add a "temporary" GUC, we're taking on a giganticburden.>> Either we support it forever somehow, or we put it on a deprecation>> schedule immediately and expect tobe answering questions about it for>> years after it's been removed.>>>> -1 for the GUC.>> Absolutely.>> So ISTM we havethree choices:>> 1) we switch unmarked CTEs as inlineable by default in pg11. What seems> likely to happen for a userthat upgrades to pg11 is that 5 out of 10> CTE-using queries are going to become faster than with pg10, and they> aregoing to be happy; 4 out of five are going to see no difference, but> they didn't have to do anything about it; and theremaining query is> going to become slower, either indistinguishably so (in which case they> don't care and they remainhappy because of the other improvements) or> notably so, in which case they can easily figure where to add the> MATERIALIZEDoption and regain the original performance. +1 (Mario, proxied by me) I've been asked to pass on comments by my colleague Mario of True Group regarding the pain of the current behaviour of CTE's being optimisation fences (as he doesn't normally follow this email group, I'll pass on any feedback): "Greetings, Jumping onto the bandwagon here. At True Group in Auckland, we use PostgreSQL as the platform for large-scale softwaredevelopment, often with extensive and complex schemas, views and queries. We frequently encounter poor query performance due to CTEs being an optimisation barrier. We are forced to use workaroundssuch as a set-returning function (parameterised view) in place of a view, and manually placing the neededquals (WHERE clauses) into each CTE. This creates headaches in large systems, where writing expressive SQL is essential, rather than employing workarounds that make code harder to understand and reuse. Our general assumption is that we should write SQL that describes what we want to achieve. The planner's job is todetermine how to do that efficiently. There is an argument that pushing quals into CTEs could reduce performance for some queries, especially if the qualis expensive. My answer is twofold: a) most queries will either experience no change or benefit from qual push-down, as usually the expensive part isthe query subtree, not the top-level qual. b) if a small proportion of queries are negatively affected, this is better addressed by improving the planner's costestimation. At worst, an explicit OPTIMIZATION BARRIER hint could be added. But I know there is much philosophicalobjection in the PostgreSQL community to planner hints. The irony is that a main argument for retaining currentCTE behaviour is that people rely on CTEs as implicit barrier hints! As an aside, we also encounter several other instances where qual push-down fails, including where rewording a queryin a way that is semantically identical can change whether push-down takes place. But probably the greatest bugbearis inability to push-down a qual into more than one subtree, meaning if a query has two or more subtrees, each of which is expensive, but only one row isneeded from each, determined by one qual at the top level, one subtree will get the qual pushed into it and run fast,while the others will do a full (expensive) table scan and be filtered only afterwards. There are also gains to be had in improving qual push-down in sub-selects with aggregates, and others. But I think these issues are only due to lackof resource to implement, rather than philosophical/political objections. We would like to see a Postgres version in the future that does much better planning in the areas I've mentioned,including but not limited to the present issue of CTEs. Our organisation may be able to contribute materially to this, if the political will is there, and others affected pitch in to achieve common goals.Not being expert in Postgres internals, I am not certain how difficult each of the problems is. I hope we have contributed usefully to the conversation, and invite feedback. Sincerely, Mario" Cheers, Gavin
On 30/04/17 16:28, Tom Lane wrote: > Craig Ringer <craig.ringer@2ndquadrant.com> writes: >> - as you noted, it is hard to decide when it's worth inlining vs >> materializing for CTE terms referenced more than once. > [ raised eyebrow... ] Please explain why the answer isn't trivially > "never". > > There's already a pretty large hill to climb here in the way of > breaking peoples' expectations about CTEs being optimization > fences. Breaking the documented semantics about CTEs being > single-evaluation seems to me to be an absolute non-starter. > > regards, tom lane > > Could not each CTE be only evaluated once, but restricted (as far as is practicable) to the rows actually needed by the body of the SELECT? Cheers, Gavin
On 04/05/2017 08:34, Petr Jelinek wrote: > On 03/05/17 23:24, Merlin Moncure wrote: >> On Wed, May 3, 2017 at 12:33 PM, Alvaro Herrera >> <alvherre@2ndquadrant.com> wrote: >>> David Fetter wrote: >>> >>>> When we add a "temporary" GUC, we're taking on a gigantic burden. >>>> Either we support it forever somehow, or we put it on a deprecation >>>> schedule immediately and expect to be answering questions about it for >>>> years after it's been removed. >>>> >>>> -1 for the GUC. >>> >>> Absolutely. >>> >>> So ISTM we have three choices: >>> >>> 1) we switch unmarked CTEs as inlineable by default in pg11. What seems >>> likely to happen for a user that upgrades to pg11 is that 5 out of 10 >>> CTE-using queries are going to become faster than with pg10, and they >>> are going to be happy; 4 out of five are going to see no difference, but >>> they didn't have to do anything about it; and the remaining query is >>> going to become slower, either indistinguishably so (in which case they >>> don't care and they remain happy because of the other improvements) or >>> notably so, in which case they can easily figure where to add the >>> MATERIALIZED option and regain the original performance. >> >> +1 for option 1. This change will be welcome for a large number of >> queries, but forced materialization is a real need and I use it often. >> This comes off as a very reasonable compromise in my opinion unless it >> requires major coding gymnastics to implement. >> > > +1 to this > +1 too -- Julien Rouhaud http://dalibo.com - http://dalibo.org
> On May 4, 2017, at 3:02 AM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote: > > On 30/04/17 16:28, Tom Lane wrote: >> Craig Ringer <craig.ringer@2ndquadrant.com> writes: >>> - as you noted, it is hard to decide when it's worth inlining vs >>> materializing for CTE terms referenced more than once. >> [ raised eyebrow... ] Please explain why the answer isn't trivially >> "never". >> >> There's already a pretty large hill to climb here in the way of >> breaking peoples' expectations about CTEs being optimization >> fences. Breaking the documented semantics about CTEs being >> single-evaluation seems to me to be an absolute non-starter. >> >> regards, tom lane >> >> > Could not each CTE be only evaluated once, but restricted (as far as is practicable) to the rows actually needed by thebody of the SELECT? > Tom, Are you worried about semantics or performance? With proper detection of mutating functions and snapshot isolation I do not see how a user would detect “lack of” singleevaluation. As for performance we’d be talking about what? An uncorrelated inner of a nested loop join? Anyway it seems to me that there a multiple properties at play here which are quite orthogonal. 1. Full materialization/Slow materialization/pipelining I cannot come up with any scenario where full materialization wouldbe beneficial from a performance point of view (which speaks to Gavin’s view). I can see it from a semantic point ofview when order of execution may matter (for example with embedded DML and triggers present). As soon as semantics areat play having syntax is absolutely the right thing: +1 for MATERIALIZE 2.Pushing predicates (or other operators) into the CTE. All this can ever do is reduce the number of rows being looked at. As long as the optimizer is careful, not to do what it isn’t supposed to do in a nested query (push past a mutating function)I don’t see the harm 3. Replicating the CTE to push distinct operators from different consumers. Again this can only be done if there are nomutators or non deterministic operators. To summarize: big +1 to preserve the existing behavior with MATERIALIZE and to set CTE’s free by default with the onus onthe optimizer to prove semantic equivalence. Cheers Serge Rielau Salesforce.com
Serge Rielau <serge@rielau.com> writes: >> On May 4, 2017, at 3:02 AM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote: >> On 30/04/17 16:28, Tom Lane wrote: >>> There's already a pretty large hill to climb here in the way of >>> breaking peoples' expectations about CTEs being optimization >>> fences. Breaking the documented semantics about CTEs being >>> single-evaluation seems to me to be an absolute non-starter. > Are you worried about semantics or performance? > With proper detection of mutating functions and snapshot isolation I do not see how a user would detect “lack of” singleevaluation. I do not think a user will have any trouble "detecting" what we did if his query runs for two hours, rather than two minutes, because we executed some expensive function 100 times rather than the one time he expected it to run. Now you could argue that that's user error because he should have marked the expensive function with a sufficiently high cost to discourage us from flattening the CTE. But not everyone will have done that (and I'm not sure we have any planner smarts that would respond to such cases anyway). So what I'm saying is that if you're promising there will be no visible bad consequences, you're wrong. It may be worth breaking some peoples' queries to make other peoples' queries faster, but I think we need to tread pretty carefully there. And we definitely can't change any case where there would be visible semantic differences. regards, tom lane
On 05/04/2017 11:36 AM, Tom Lane wrote: > Serge Rielau <serge@rielau.com> writes: >>> On May 4, 2017, at 3:02 AM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote: >>> On 30/04/17 16:28, Tom Lane wrote: >>>> There's already a pretty large hill to climb here in the way of >>>> breaking peoples' expectations about CTEs being optimization >>>> fences. Breaking the documented semantics about CTEs being >>>> single-evaluation seems to me to be an absolute non-starter. >> Are you worried about semantics or performance? >> With proper detection of mutating functions and snapshot isolation I do not see how a user would detect “lack of” singleevaluation. > I do not think a user will have any trouble "detecting" what we did > if his query runs for two hours, rather than two minutes, because we > executed some expensive function 100 times rather than the one time > he expected it to run. > > Now you could argue that that's user error because he should have > marked the expensive function with a sufficiently high cost to > discourage us from flattening the CTE. But not everyone will have > done that (and I'm not sure we have any planner smarts that would > respond to such cases anyway). So what I'm saying is that if you're > promising there will be no visible bad consequences, you're wrong. > > It may be worth breaking some peoples' queries to make other peoples' > queries faster, but I think we need to tread pretty carefully there. > And we definitely can't change any case where there would be visible > semantic differences. > Yeah, the idea that this won't cause possibly significant pain is quite wrong. Quite by accident I came across an examplejust this morning where rewriting as a CTE makes a big improvement. I wrote this query: select (json_populate_record(null::mytype, myjson)).* from mytable; It turned out that this was an order of magnitude faster: with r as ( select json_populate_record(null::mytype, myjson) as x from mytable ) select (x).* fromr; cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Yeah, the idea that this won't cause possibly significant pain is quite wrong. Quite by accident I came across an example just this morning where rewriting as a CTE makes a big improvement.
I wrote this query:
select (json_populate_record(null::mytype, myjson)).*
from mytable;
It turned out that this was an order of magnitude faster:
with r as
(
select json_populate_record(null::mytype, myjson) as x
from mytable
)
select (x).*
from r;
Except I suspect we at least have a chance to detect the above and not de-optimize it by evaluating "json_populate_record" once for every column in mytype.
The now idiomatic solution to the above is to use LATERAL so the above CTE is no longer actually a required workaround.
David J.
On 2017-05-04 09:34:19 -0700, David G. Johnston wrote: > On Thu, May 4, 2017 at 9:22 AM, Andrew Dunstan < > andrew.dunstan@2ndquadrant.com> wrote: > > > > > Yeah, the idea that this won't cause possibly significant pain is quite > > wrong. Quite by accident I came across an example just this morning where > > rewriting as a CTE makes a big improvement. > > > > I wrote this query: > > > > select (json_populate_record(null::mytype, myjson)).* > > from mytable; > > > > > > It turned out that this was an order of magnitude faster: > > > > with r as > > ( > > select json_populate_record(null::mytype, myjson) as x > > from mytable > > ) > > select (x).* > > from r; That's a very specific issue, that you can just as well solve with CTEs though. IIRC Tom and I were discussing changing this - independently of anything CTE releated - so it doesn't end up calling the function multiple times. We e.g. could now quite easily put the json_populate_record(null::mytype, myjson) into something *like* a nodeProjectSet node, and handle the column accesses via the nromal projection. > Except I suspect we at least have a chance to detect the above and not > de-optimize it by evaluating "json_populate_record" once for every column > in mytype. That's what happens with the non CTE version - don't think it's super likely we'd end up doing that for CTEs; I'd even consider it a bug. Greetings, Andres Freund
On 05/04/2017 12:34 PM, David G. Johnston wrote: > On Thu, May 4, 2017 at 9:22 AM, Andrew Dunstan > <andrew.dunstan@2ndquadrant.com > <mailto:andrew.dunstan@2ndquadrant.com>>wrote: > > > Yeah, the idea that this won't cause possibly significant pain is > quite wrong. Quite by accident I came across an example just this > morning where rewriting as a CTE makes a big improvement. > > I wrote this query: > > select (json_populate_record(null::mytype, myjson)).* > from mytable; > > > It turned out that this was an order of magnitude faster: > > with r as > ( > select json_populate_record(null::mytype, myjson) as x > from mytable > ) > select (x).* > from r; > > > Except I suspect we at least have a chance to detect the above and > not de-optimize it by evaluating "json_populate_record" once for every > column in mytype. > > The now idiomatic solution to the above is to use LATERAL so the > above CTE is no longer actually a required workaround. Hadn't though about LATERAL, good point. Still, there will be other cases. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andrew Dunstan wrote: > Hadn't though about LATERAL, good point. Still, there will be other cases. I'm not sure what your point is. We know that for some cases the optimization barrier semantics are useful, which is why the proposal is to add a keyword to install one explicitely: with materialized r as ( select json_populate_record(null::mytype, myjson) as x from mytable ) select (x).* from r; this would preserve the current semantics. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 05/04/2017 01:33 PM, Alvaro Herrera wrote: > Andrew Dunstan wrote: > >> Hadn't though about LATERAL, good point. Still, there will be other cases. > I'm not sure what your point is. We know that for some cases the > optimization barrier semantics are useful, which is why the proposal is > to add a keyword to install one explicitely: > > with materialized r as > ( > select json_populate_record(null::mytype, myjson) as x > from mytable > ) > select (x).* > from r; > > this would preserve the current semantics. > I understand. Some people appear to think hardly anyone will actually need this - that's what I was arguing against. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 05/04/2017 10:33 AM, Alvaro Herrera wrote: > I'm not sure what your point is. We know that for some cases the > optimization barrier semantics are useful, which is why the proposal is > to add a keyword to install one explicitely: > > with materialized r as > ( > select json_populate_record(null::mytype, myjson) as x > from mytable > ) > select (x).* > from r; > > this would preserve the current semantics. I haven't been able to follow this incredibly long thread, so please excuse me if way off base, but are we talking about that a CTE would be silently be rewritten as an inline expression potentially unless it is decorated with some new syntax? I would find that very disconcerting myself. For example, would this CTE potentially get rewritten with multiple evaluation as follows? DROP SEQUENCE IF EXISTS foo_seq; CREATE SEQUENCE foo_seq; WITH a(f1) AS (SELECT nextval('foo_seq')) SELECT a.f1, a.f1 FROM a;f1 | ?column? ----+---------- 1 | 1 (1 row) ALTER SEQUENCE foo_seq RESTART; SELECT nextval('foo_seq'), nextval('foo_seq');nextval | ?column? ---------+---------- 1 | 2 (1 row) Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
On 05/04/2017 01:52 PM, Joe Conway wrote: > On 05/04/2017 10:33 AM, Alvaro Herrera wrote: >> I'm not sure what your point is. We know that for some cases the >> optimization barrier semantics are useful, which is why the proposal is >> to add a keyword to install one explicitely: >> >> with materialized r as >> ( >> select json_populate_record(null::mytype, myjson) as x >> from mytable >> ) >> select (x).* >> from r; >> >> this would preserve the current semantics. > I haven't been able to follow this incredibly long thread, so please > excuse me if way off base, but are we talking about that a CTE would be > silently be rewritten as an inline expression potentially unless it is > decorated with some new syntax? > > I would find that very disconcerting myself. For example, would this CTE > potentially get rewritten with multiple evaluation as follows? > > DROP SEQUENCE IF EXISTS foo_seq; > CREATE SEQUENCE foo_seq; > > WITH a(f1) AS (SELECT nextval('foo_seq')) > SELECT a.f1, a.f1 FROM a; > f1 | ?column? > ----+---------- > 1 | 1 > (1 row) > > ALTER SEQUENCE foo_seq RESTART; > SELECT nextval('foo_seq'), nextval('foo_seq'); > nextval | ?column? > ---------+---------- > 1 | 2 > (1 row) > I think that would be a change in semantics, which we should definitely not be getting. Avoiding a change in semantics might be an interesting exercise, but we have lots of clever coders ... cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 5/4/17 7:56 PM, Andrew Dunstan wrote: > > > On 05/04/2017 01:52 PM, Joe Conway wrote: >> On 05/04/2017 10:33 AM, Alvaro Herrera wrote: >>> I'm not sure what your point is. We know that for some cases the >>> optimization barrier semantics are useful, which is why the proposal is >>> to add a keyword to install one explicitely: >>> >>> with materialized r as >>> ( >>> select json_populate_record(null::mytype, myjson) as x >>> from mytable >>> ) >>> select (x).* >>> from r; >>> >>> this would preserve the current semantics. >> I haven't been able to follow this incredibly long thread, so please >> excuse me if way off base, but are we talking about that a CTE would be >> silently be rewritten as an inline expression potentially unless it is >> decorated with some new syntax? >> >> I would find that very disconcerting myself. For example, would this CTE >> potentially get rewritten with multiple evaluation as follows? >> >> DROP SEQUENCE IF EXISTS foo_seq; >> CREATE SEQUENCE foo_seq; >> >> WITH a(f1) AS (SELECT nextval('foo_seq')) >> SELECT a.f1, a.f1 FROM a; >> f1 | ?column? >> ----+---------- >> 1 | 1 >> (1 row) >> >> ALTER SEQUENCE foo_seq RESTART; >> SELECT nextval('foo_seq'), nextval('foo_seq'); >> nextval | ?column? >> ---------+---------- >> 1 | 2 >> (1 row) >> > > > > I think that would be a change in semantics, which we should definitely > not be getting. Avoiding a change in semantics might be an interesting > exercise, but we have lots of clever coders ... > nextval is a volatile function, and in my understanding the plan was not to inline CTEs with volatile functions (or CTEs doing writes etc.). That is, we'd guarantee the same results as we get now. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 05/04/2017 10:56 AM, Andrew Dunstan wrote: > > > On 05/04/2017 01:52 PM, Joe Conway wrote: >> On 05/04/2017 10:33 AM, Alvaro Herrera wrote: >>> I'm not sure what your point is. We know that for some cases the >>> optimization barrier semantics are useful, which is why the proposal is >>> to add a keyword to install one explicitely: >>> >>> with materialized r as >>> ( >>> select json_populate_record(null::mytype, myjson) as x >>> from mytable >>> ) >>> select (x).* >>> from r; >>> >>> this would preserve the current semantics. >> I haven't been able to follow this incredibly long thread, so please >> excuse me if way off base, but are we talking about that a CTE would be >> silently be rewritten as an inline expression potentially unless it is >> decorated with some new syntax? >> >> I would find that very disconcerting myself. For example, would this CTE >> potentially get rewritten with multiple evaluation as follows? >> >> DROP SEQUENCE IF EXISTS foo_seq; >> CREATE SEQUENCE foo_seq; >> >> WITH a(f1) AS (SELECT nextval('foo_seq')) >> SELECT a.f1, a.f1 FROM a; >> f1 | ?column? >> ----+---------- >> 1 | 1 >> (1 row) >> >> ALTER SEQUENCE foo_seq RESTART; >> SELECT nextval('foo_seq'), nextval('foo_seq'); >> nextval | ?column? >> ---------+---------- >> 1 | 2 >> (1 row) >> > > I think that would be a change in semantics, which we should definitely > not be getting. Avoiding a change in semantics might be an interesting > exercise, but we have lots of clever coders ... Well I think my point is that I always have understood CTEs to be executed precisely once producing a temporary result set that is then referenced elsewhere. I don't think that property of CTEs should change. Somewhere else in the thread someone mentioned predicate push down -- that makes sense and maybe some clever coder can come up with a patch that does that, but I would not be in favor of CTEs being inlined and therefore evaluated multiple times. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > I'm not sure what your point is. We know that for some cases the > optimization barrier semantics are useful, which is why the proposal is > to add a keyword to install one explicitely: > with materialized r as > ( > select json_populate_record(null::mytype, myjson) as x > from mytable > ) > select (x).* > from r; > this would preserve the current semantics. Uh, no, it wouldn't, until you had changed your query (and made it non-SQL-compliant, and unable to work at all on pre-v11 PG). This might be a good design, but surely it does not provide backwards compatibility. regards, tom lane
On 5/4/17 8:03 PM, Joe Conway wrote: > On 05/04/2017 10:56 AM, Andrew Dunstan wrote: >> >> >> On 05/04/2017 01:52 PM, Joe Conway wrote: >>> On 05/04/2017 10:33 AM, Alvaro Herrera wrote: >>>> I'm not sure what your point is. We know that for some cases the >>>> optimization barrier semantics are useful, which is why the proposal is >>>> to add a keyword to install one explicitely: >>>> >>>> with materialized r as >>>> ( >>>> select json_populate_record(null::mytype, myjson) as x >>>> from mytable >>>> ) >>>> select (x).* >>>> from r; >>>> >>>> this would preserve the current semantics. >>> I haven't been able to follow this incredibly long thread, so please >>> excuse me if way off base, but are we talking about that a CTE would be >>> silently be rewritten as an inline expression potentially unless it is >>> decorated with some new syntax? >>> >>> I would find that very disconcerting myself. For example, would this CTE >>> potentially get rewritten with multiple evaluation as follows? >>> >>> DROP SEQUENCE IF EXISTS foo_seq; >>> CREATE SEQUENCE foo_seq; >>> >>> WITH a(f1) AS (SELECT nextval('foo_seq')) >>> SELECT a.f1, a.f1 FROM a; >>> f1 | ?column? >>> ----+---------- >>> 1 | 1 >>> (1 row) >>> >>> ALTER SEQUENCE foo_seq RESTART; >>> SELECT nextval('foo_seq'), nextval('foo_seq'); >>> nextval | ?column? >>> ---------+---------- >>> 1 | 2 >>> (1 row) >>> >> >> I think that would be a change in semantics, which we should definitely >> not be getting. Avoiding a change in semantics might be an interesting >> exercise, but we have lots of clever coders ... > > Well I think my point is that I always have understood CTEs to be > executed precisely once producing a temporary result set that is then > referenced elsewhere. I don't think that property of CTEs should change. > Somewhere else in the thread someone mentioned predicate push down -- > that makes sense and maybe some clever coder can come up with a patch > that does that, but I would not be in favor of CTEs being inlined and > therefore evaluated multiple times. > I agree with this, but there's a difference between "executed exactly once" and "producing the same result as if executed exactly once". I may be misunderstanding what other people proposed in this thread, but I think the plan was to only inline CTEs where we know it won't change the results, etc. So e.g. CTEs with volatile functions would not get inlined, which includes nextval() for example. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On 5/4/17 8:03 PM, Joe Conway wrote: >>> I haven't been able to follow this incredibly long thread, so please >>> excuse me if way off base, but are we talking about that a CTE would be >>> silently be rewritten as an inline expression potentially unless it is >>> decorated with some new syntax? > I agree with this, but there's a difference between "executed exactly > once" and "producing the same result as if executed exactly once". > I may be misunderstanding what other people proposed in this thread, but > I think the plan was to only inline CTEs where we know it won't change > the results, etc. So e.g. CTEs with volatile functions would not get > inlined, which includes nextval() for example. I haven't been keeping close tabs either, but surely we still have to have the optimization fence in (at least) all these cases: * CTE contains INSERT/UPDATE/DELETE * CTE contains SELECT FOR UPDATE/SHARE (else the set of rows that get locked might change) * CTE contains volatile functions I'm willing to write off cases where, eg, a function should have been marked volatile and was not. That's user error and there are plenty of hazards of that kind already. But if the optimizer has reason to know that discarding the fence might change any query side-effects, it mustn't. regards, tom lane
I haven't been keeping close tabs either, but surely we still have to have
the optimization fence in (at least) all these cases:
* CTE contains INSERT/UPDATE/DELETE
* CTE contains SELECT FOR UPDATE/SHARE (else the set of rows that get
locked might change)
* CTE contains volatile functions
I'm willing to write off cases where, eg, a function should have been
marked volatile and was not. That's user error and there are plenty
of hazards of that kind already. But if the optimizer has reason
to know that discarding the fence might change any query side-effects,
it mustn't.
Yes! +100
Cheers
Serge
On 05/05/17 06:39, Tomas Vondra wrote: > > > On 5/4/17 8:03 PM, Joe Conway wrote: >> On 05/04/2017 10:56 AM, Andrew Dunstan wrote: >>> >>> >>> On 05/04/2017 01:52 PM, Joe Conway wrote: >>>> On 05/04/2017 10:33 AM, Alvaro Herrera wrote: >>>>> I'm not sure what your point is. We know that for some cases the >>>>> optimization barrier semantics are useful, which is why the >>>>> proposal is >>>>> to add a keyword to install one explicitely: >>>>> >>>>> with materialized r as >>>>> ( >>>>> select json_populate_record(null::mytype, myjson) as x >>>>> from mytable >>>>> ) >>>>> select (x).* >>>>> from r; >>>>> >>>>> this would preserve the current semantics. >>>> I haven't been able to follow this incredibly long thread, so please >>>> excuse me if way off base, but are we talking about that a CTE >>>> would be >>>> silently be rewritten as an inline expression potentially unless it is >>>> decorated with some new syntax? >>>> >>>> I would find that very disconcerting myself. For example, would >>>> this CTE >>>> potentially get rewritten with multiple evaluation as follows? >>>> >>>> DROP SEQUENCE IF EXISTS foo_seq; >>>> CREATE SEQUENCE foo_seq; >>>> >>>> WITH a(f1) AS (SELECT nextval('foo_seq')) >>>> SELECT a.f1, a.f1 FROM a; >>>> f1 | ?column? >>>> ----+---------- >>>> 1 | 1 >>>> (1 row) >>>> >>>> ALTER SEQUENCE foo_seq RESTART; >>>> SELECT nextval('foo_seq'), nextval('foo_seq'); >>>> nextval | ?column? >>>> ---------+---------- >>>> 1 | 2 >>>> (1 row) >>>> >>> >>> I think that would be a change in semantics, which we should definitely >>> not be getting. Avoiding a change in semantics might be an interesting >>> exercise, but we have lots of clever coders ... >> >> Well I think my point is that I always have understood CTEs to be >> executed precisely once producing a temporary result set that is then >> referenced elsewhere. I don't think that property of CTEs should change. >> Somewhere else in the thread someone mentioned predicate push down -- >> that makes sense and maybe some clever coder can come up with a patch >> that does that, but I would not be in favor of CTEs being inlined and >> therefore evaluated multiple times. >> > > I agree with this, but there's a difference between "executed exactly > once" and "producing the same result as if executed exactly once". > > I may be misunderstanding what other people proposed in this thread, > but I think the plan was to only inline CTEs where we know it won't > change the results, etc. So e.g. CTEs with volatile functions would > not get inlined, which includes nextval() for example. > > regards > It was the behaviour of "producing the same result as if executed exactly once" that I was thinking of - I think this is still valid for triggers & volatile functions, but such behaviour should be clearly documented. This what I implicitly thought about CTE's when I first came across them - to me it is the intuitively obvious behaviour. However, limiting the rows based on the body of the SELECT would often be a very useful optimisation Cheers, Gavin
On 05/04/2017 06:22 PM, Andrew Dunstan wrote: > I wrote this query: > > select (json_populate_record(null::mytype, myjson)).* > from mytable; > > > It turned out that this was an order of magnitude faster: > > with r as > ( > select json_populate_record(null::mytype, myjson) as x > from mytable > ) > select (x).* > from r; I do not know the planner that well, but I imagined that when we remove the optimization fence that one would be evaluated similar to if it had been a lateral join, i.e. there would be no extra function calls in this case after removing the fence. Andreas
On 5 May 2017 02:52, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>>> I haven't been able to follow this incredibly long thread, so please
>>> excuse me if way off base, but are we talking about that a CTE would be
>>> silently be rewritten as an inline expression potentially unless it is
>>> decorated with some new syntax?> I agree with this, but there's a difference between "executed exactlyI haven't been keeping close tabs either, but surely we still have to have
> once" and "producing the same result as if executed exactly once".
> I may be misunderstanding what other people proposed in this thread, but
> I think the plan was to only inline CTEs where we know it won't change
> the results, etc. So e.g. CTEs with volatile functions would not get
> inlined, which includes nextval() for example.
the optimization fence in (at least) all these cases:
* CTE contains INSERT/UPDATE/DELETE
* CTE contains SELECT FOR UPDATE/SHARE (else the set of rows that get
locked might change)
* CTE contains volatile functions
I'm willing to write off cases where, eg, a function should have been
marked volatile and was not. That's user error and there are plenty
of hazards of that kind already. But if the optimizer has reason
to know that discarding the fence might change any query side-effects,
it mustn't.
I think everyone is in total agreement there.
On 5 May 2017 06:04, "Andreas Karlsson" <andreas@proxel.se> wrote:
On 05/04/2017 06:22 PM, Andrew Dunstan wrote:I do not know the planner that well, but I imagined that when we remove the optimization fence that one would be evaluated similar to if it had been a lateral join, i.e. there would be no extra function calls in this case after removing the fence.I wrote this query:
select (json_populate_record(null::mytype, myjson)).*
from mytable;
It turned out that this was an order of magnitude faster:
with r as
(
select json_populate_record(null::mytype, myjson) as x
from mytable
)
select (x).*
from r;
Sort of. PostgreSQL has a wart around (x).* expansion where it's essentially macro-expanded into
(x).a, (x).b, (x).c, ...
Now, if x is a function call, PG will merrily execute it n times for its n output columns.
Andres is working on fixing this. And it's trivially worked around with a lateral query ; the above would be better written as
select (x).*
from mytable
cross join lateral json_populate_record(null::myt ype, myjson) as x;
So this example just abuses our optimiser hint behaviour for CTEs to avoid solving a planner issue (why project policy is against hints). But there's already a solution.
I'm finding it increasingly hilarious watching people vociferously defending their one precious (semi-documented) query/optimiser hint in PostgreSQL. The one we don't admit is a hint, but treat as one by avoiding optimising across it when it's safe to do so.
We can't remove or change our precious hint because we need it to solve production issues. But we don't have hints because then people wouldn't report planner/optimiser issues, would lock in bad plans and then complain about it, etc.
Just like what's happening here. And people are leaping to defend it, lest we risk exposing performance issues by changing anything, even though all we're doing is documenting what is already so.
Hey. Crazy idea for backward compat to address Tom's complaint that adding explicit syntax would require people who wanted the old behaviour to make their queries incompatible with pg10 and below. Add the "MATERIALIZED" keyword or whatever. The back patch the keyword as a no-op, since that's what we already do in back branches. I can't see anything that could possibly break in that context so long as we only go as far back as it was already a keyword elsewhere.
We could at least add it to pg10.
On 05/04/2017 05:03 PM, Craig Ringer wrote: > On 5 May 2017 02:52, "Tom Lane" wrote: > I haven't been keeping close tabs either, but surely we still have > to have > the optimization fence in (at least) all these cases: > > * CTE contains INSERT/UPDATE/DELETE > * CTE contains SELECT FOR UPDATE/SHARE (else the set of rows that get > locked might change) > * CTE contains volatile functions > > I'm willing to write off cases where, eg, a function should have been > marked volatile and was not. That's user error and there are plenty > of hazards of that kind already. But if the optimizer has reason > to know that discarding the fence might change any query side-effects, > it mustn't. > > I think everyone is in total agreement there. That's great, but if so, why do we need any change in syntax at all? Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
On 5 May 2017 at 08:17, Joe Conway <mail@joeconway.com> wrote: > On 05/04/2017 05:03 PM, Craig Ringer wrote: >> On 5 May 2017 02:52, "Tom Lane" wrote: >> I haven't been keeping close tabs either, but surely we still have >> to have >> the optimization fence in (at least) all these cases: >> >> * CTE contains INSERT/UPDATE/DELETE >> * CTE contains SELECT FOR UPDATE/SHARE (else the set of rows that get >> locked might change) >> * CTE contains volatile functions >> >> I'm willing to write off cases where, eg, a function should have been >> marked volatile and was not. That's user error and there are plenty >> of hazards of that kind already. But if the optimizer has reason >> to know that discarding the fence might change any query side-effects, >> it mustn't. >> >> I think everyone is in total agreement there. > > That's great, but if so, why do we need any change in syntax at all? Because a lot of people use it as a query hint, and don't want to let go of the one precious query hint they rely on in PostgreSQL to solve urgent production issues or work around planner limitations. You can currently (ab)use a CTE with fencing behaviour to force join order, force evaluation of expensive functions, etc, if the planner would otherwise pick a semantically identical plan that lands up running slower. Many people don't want to give up their hint because it's an important tool in their "solve user/customer problems fast" toolbox. Project policy says we don't want query hints, don't have them, and won't add them. I understand the reasoning for this (and mostly agree). But we kind of added one by accident anyway, and it's proving rather hard to take it away! We added it by documenting an implementation detail/limitation - that there's limited or no inlining/pullup/pushdown across CTE terms. People took that as blessing to rely on that behaviour in future releases, treating it as an optimisation barrier query hint. Since then some people have been promoting it as a workaround for performance issues faced by queries with sub-optimal plans. People "in the know" used to use OFFSET 0 for that instead, but it was entirely undocumented and could go away in a future release without any warning, plus it was uglier. The CTE limitation-as-feature got a lot more press in blogs, etc since it's documented-ish. So now we're collectively trying to get to a compromise position where people who rely on it can keep the functionality by declaring the no-inline requirement explicitly in their SQL where they depend on it. That way it won't penalise everyone else who just wants the optimiser to do its best the rest of the time, so we can just write SQL that says "I want this outcome" without having to worry about a semi-documented performance booby-trap. So surprise and confuse users who're (reasonably) expecting the optimiser to treat a CTE like a subquery, view or SQL function, those who're writing portable SQL and those who're porting from other DBMSes. The sticking point is that this seems to require admitting that it serves as a hint, of sorts, where it forces the optimiser to choose a different plan than its cost based estimates would suggest if all candidate plans are semantically identical. We're carefully maintaining this bizarre cognitive dissonance where we justify the need for using this as a planner hint at the same time as denying that we have a hint. That makes it hard to make progress here. I think there's fear that we're setting some kind of precedent by admitting what we already have. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Remote DBA, Training &Services
Craig Ringer <craig.ringer@2ndquadrant.com> writes: > We're carefully maintaining this bizarre cognitive dissonance where we > justify the need for using this as a planner hint at the same time as > denying that we have a hint. That makes it hard to make progress here. > I think there's fear that we're setting some kind of precedent by > admitting what we already have. I think you're overstating the case. It's clear that there's a significant subset of CTE functionality where there has to be an optimization fence. The initial implementation basically took the easy way out by deeming *all* CTEs to be optimization fences. Maybe we shouldn't have documented that behavior, but we did. Now we're arguing about how much of a compatibility break it'd be to change that planner behavior. I don't see any particular cognitive dissonance here, just disagreements about the extent to which backwards compatibility is more important than better query optimization. regards, tom lane
In my past life when I was faced with such debates I argued that the number of customers We are hoping to attract in the future is much bigger than the ones we risk offending.
Doesn't mean I wanted to piss everyone off. Just that I didn't want to be held hostage by history.
Cheers
Serge
PS: On the opposing side was typically the mainframe crowd.... hint hint
Tom, all, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Craig Ringer <craig.ringer@2ndquadrant.com> writes: > > We're carefully maintaining this bizarre cognitive dissonance where we > > justify the need for using this as a planner hint at the same time as > > denying that we have a hint. That makes it hard to make progress here. > > I think there's fear that we're setting some kind of precedent by > > admitting what we already have. > > I think you're overstating the case. It's clear that there's a > significant subset of CTE functionality where there has to be an > optimization fence. The initial implementation basically took the > easy way out by deeming *all* CTEs to be optimization fences. Maybe > we shouldn't have documented that behavior, but we did. Now we're > arguing about how much of a compatibility break it'd be to change that > planner behavior. I don't see any particular cognitive dissonance here, > just disagreements about the extent to which backwards compatibility is > more important than better query optimization. This really begs the question of if it's "better." If it's always better, then there's no discussion to be had. If there are cases where it's not, then that could cause problems for people and there's a reason to question if we're ok with that. Of course, if a user runs into such an issue, there's a few ways they can address it without needs any additional syntax (use a temp table instead, introduce a volative function into the CTE, etc). I had originally been on the fence about this myself, and even advocated the new-syntax approach at one point (as I recall), but with the new statistics we're gaining and the improvmenets in the planner, along with parallel query and the advantages offered by being able to pull a CTE into the main query, I think I've come around to agree that removing the optimization fence, when it won't change the resullts, is a good thing to do. Thanks! Stephen
On 05/04/2017 07:04 PM, Tom Lane wrote: > Craig Ringer <craig.ringer@2ndquadrant.com> writes: >> We're carefully maintaining this bizarre cognitive dissonance where we >> justify the need for using this as a planner hint at the same time as >> denying that we have a hint. That makes it hard to make progress here. >> I think there's fear that we're setting some kind of precedent by >> admitting what we already have. > > I think you're overstating the case. It's clear that there's a > significant subset of CTE functionality where there has to be an > optimization fence. The initial implementation basically took the > easy way out by deeming *all* CTEs to be optimization fences. Maybe > we shouldn't have documented that behavior, but we did. Now we're > arguing about how much of a compatibility break it'd be to change that > planner behavior. I don't see any particular cognitive dissonance here, > just disagreements about the extent to which backwards compatibility is > more important than better query optimization. Exactly. One thought, is that we treat a CTE in a similar way to foreign tables, with the same set of push downs. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Hi, On 2017-05-04 19:57:21 -0700, Joe Conway wrote: > One thought, is that we treat a CTE in a similar way to foreign tables, > with the same set of push downs. A bit surprised about that suggestion - there seems to be very little similarity between the cases. What'd be the benefit of that? With FDWs there's not even a common set of things pushed down between different FDW users. Greetings, Andres Freund
Thomas Kellerer wrote: >> 1) we switch unmarked CTEs as inlineable by default in pg11. > > +1 from me for option 1 +1 from me as well, FWIW. Yours, Laurenz Albe
On 5 May 2017 at 14:04, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Craig Ringer <craig.ringer@2ndquadrant.com> writes: >> We're carefully maintaining this bizarre cognitive dissonance where we >> justify the need for using this as a planner hint at the same time as >> denying that we have a hint. That makes it hard to make progress here. >> I think there's fear that we're setting some kind of precedent by >> admitting what we already have. > > I think you're overstating the case. It's clear that there's a > significant subset of CTE functionality where there has to be an > optimization fence. The initial implementation basically took the > easy way out by deeming *all* CTEs to be optimization fences. Maybe > we shouldn't have documented that behavior, but we did. Now we're > arguing about how much of a compatibility break it'd be to change that > planner behavior. I don't see any particular cognitive dissonance here, > just disagreements about the extent to which backwards compatibility is > more important than better query optimization. How about we get the ball rolling on this in v10 and pull that part out of the docs. If anything that'll buy us a bit more wiggle room to change this in v11. I've attached a proposed patch. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 05/03/2017 07:33 PM, Alvaro Herrera wrote: > 1) we switch unmarked CTEs as inlineable by default in pg11. What seems > likely to happen for a user that upgrades to pg11 is that 5 out of 10 > CTE-using queries are going to become faster than with pg10, and they > are going to be happy; 4 out of five are going to see no difference, but > they didn't have to do anything about it; and the remaining query is > going to become slower, either indistinguishably so (in which case they > don't care and they remain happy because of the other improvements) or > notably so, in which case they can easily figure where to add the > MATERIALIZED option and regain the original performance. I am overwhelmingly in favor of this option. I am in favor of an enable_inlinedcte guc in the same spirit as the other enable_* gucs, but violently opposed to any "compatibility" guc. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Ok, it seems that most people in discussion are agree that removing optimization fence is a right thing to do. But so far the main topic was whether it worth to make "inlining" by default, and how we should enable it. Nonetheless I still hoping to discuss the algorithm and its implementation. I suppose, in case of a single reference we can validate CTE subquery and inline it just before SS_process_ctes() in subquery_planner() and then process remaining CTEs as before. The case of multiple reference is more interesting. Ideally, we would decide whether to inline just before pull_up_sublinks(), so all the optimizations can be applied to inlined subquery. But It is impossible as we have no information to build subquery paths and estimate they costs at this point. All necessary initialization is performed in query_planner(), that invoked far later in grouping_planner(). (As far as I understand.) The most straighforward way is to compare CTE scan cost with subquery execution and result scan cost in set_rel_size(), just after set_cte_pathlist(), and alter RelOptInfo, if we choose to inline. (e.g (CTE scan) < (cheapest_path(subquery) + subquery scan)) This way we still can push down predicates as it is performed in set_subquery_pathlist(), but we missed pull_up_subquery(). Besides, it seems like a dirty quick solution. Maybe it possible to add subquery scan to RTE_CTE RelOptInfo, but I'm not sure. So what is a right way to conduct comparison between CTE scan and subquery execution with subsequent scan? I am new to PostgreSQL development, so I need a guidance from someone who familiar with optimizer infrastructure to ensure that I moving in a right direction and not making something weird. P.S. There is a paper [1] describing implementation of CTE optimization in Orca optimizer. It may be useful, though architecture is completely different. [1] Optimization of Common Table Expressions in MPP Database Systems (http://www.vldb.org/pvldb/vol8/p1704-elhelw.pdf) Ilya Shkuratov
On 5/5/17 08:43, David Rowley wrote: > How about we get the ball rolling on this in v10 and pull that part > out of the docs. If anything that'll buy us a bit more wiggle room to > change this in v11. > > I've attached a proposed patch. If we just tell them that the thing they might have relied on might go away, without a replacement to suggest, then we're just confusing and scaring them, no? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 5/5/17 08:43, David Rowley wrote:
> How about we get the ball rolling on this in v10 and pull that part
> out of the docs. If anything that'll buy us a bit more wiggle room to
> change this in v11.
>
> I've attached a proposed patch.
If we just tell them that the thing they might have relied on might go
away, without a replacement to suggest, then we're just confusing and
scaring them, no?
We'd end up suggesting our OFFSET 0 hack as true protection. If they know for a fact that their use of CTE for its barrier properties is not supported they are also more likely to document intentional usage with something like: "-- CHANGE THIS ONCE VERSION 11 IS RELEASED!!! --" which would make finding the call sites that need to add the new "MATERIALIZED" keyword much easier.
David J.
On Tue, May 9, 2017 at 12:22 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On 5/5/17 08:43, David Rowley wrote:
> How about we get the ball rolling on this in v10 and pull that part
> out of the docs. If anything that'll buy us a bit more wiggle room to
> change this in v11.
>
> I've attached a proposed patch.
If we just tell them that the thing they might have relied on might go
away, without a replacement to suggest, then we're just confusing and
scaring them, no?We'd end up suggesting our OFFSET 0 hack as true protection. If they know for a fact that their use of CTE for its barrier properties is not supported they are also more likely to document intentional usage with something like: "-- CHANGE THIS ONCE VERSION 11 IS RELEASED!!! --" which would make finding the call sites that need to add the new "MATERIALIZED" keyword much easier.
How about adding MATERIALIZED now (in 10) as a noise word.
Give people a release to switch over before pulling the rug..
Cheers
Serge
"David G. Johnston" <david.g.johnston@gmail.com> writes: > On Tue, May 9, 2017 at 12:15 PM, Peter Eisentraut < > peter.eisentraut@2ndquadrant.com> wrote: >> If we just tell them that the thing they might have relied on might go >> away, without a replacement to suggest, then we're just confusing and >> scaring them, no? > We'd end up suggesting our OFFSET 0 hack as true protection. Considering that many of the commenters in this thread view OFFSET 0 as a vile hack that ought to go away, I can hardly see how that's an improvement. I tend to agree with Peter that there's no need to do anything until we have a committable code improvement. Documentation changes that push people towards adding OFFSET 0, without any certainty that that will be the long-term answer, do not seem like a net positive. Also, considering that this behavior has been there since 8.4, I think it's sheerest chutzpah to claim that changing the docs in v10 would materially reduce the backward-compatibility concerns for whatever we might do in v11. regards, tom lane
Also, considering that this behavior has been there since 8.4,
I think it's sheerest chutzpah to claim that changing the docs in
v10 would materially reduce the backward-compatibility concerns
for whatever we might do in v11.
No it won't, but those who are using 10 as their first version would probably be happy if this was covered in a bit more depth. Even a comment like "Unlike most other DBMS PostgreSQL presently executes the subquery in the CTE in relative isolation. It is suggested that you document any intentional usage of this optimization fence as a query planning hint so that should the default change in the future you can update the query to support whatever official syntax is implemented to retain this behavior.
We cannot really help those who have been using this since 8.4 and won't read the relevant docs because they don't need to learn anything new; but we can at least avoid subjecting newer customers. I'm not that strongly in favor of it but it would be a nice touch - even if it won't change the risk/reward calculation in any meaningful way.
David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > On Tue, May 9, 2017 at 12:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Also, considering that this behavior has been there since 8.4, >> I think it's sheerest chutzpah to claim that changing the docs in >> v10 would materially reduce the backward-compatibility concerns >> for whatever we might do in v11. > No it won't, but those who are using 10 as their first version would > probably be happy if this was covered in a bit more depth. I think the existing doc text is perfectly clear (while David's proposed replacement text is not). > Even a comment > like "Unlike most other DBMS PostgreSQL presently executes the subquery in > the CTE in relative isolation. It is suggested that you document any > intentional usage of this optimization fence as a query planning hint so > that should the default change in the future you can update the query to > support whatever official syntax is implemented to retain this behavior. Well, TBH that is pre-judging what (if anything) is going to be changed by a feature that we don't even have design consensus on, let alone a patch for. I don't think that's an improvement or a good service to our users; it's just promoting confusion. I think that until we do have a design and a patch, we're better off leaving the docs reading the way they have for the past eight years. I'm also a bit mystified by the apparent urgency to change something, anything, in this area when we're already past feature freeze. This would be a fit subject for discussion several months from now when v11 development opens, but right now it's just distracting us from stabilizing v10. regards, tom lane
Ilya Shkuratov <motr.ilya@ya.ru> writes: > Ok, it seems that most people in discussion are agree that removing optimization > fence is a right thing to do. > Nonetheless I still hoping to discuss the algorithm and its implementation. Yeah, so far we've mainly discussed whether to do that and how to control it, not what the actual results would be. > I suppose, in case of a single reference we can validate CTE subquery and inline it > just before SS_process_ctes() in subquery_planner() and then process remaining > CTEs as before. Yeah, something like that ought to be possible. I'd prefer to think of it as related to the work prepjointree.c does, ie pull_up_sublinks, which raises the question why we do SS_process_ctes before that not after it. You might want to start by postponing planning of CTE subqueries as late as possible, without any other changes. In particular, I think it'd be a good thing to postpone Path-to-Plan conversion of the subqueries until much later, ideally not till the final toplevel create_plan() call. This is tied up with early Plan creation for regular subqueries too, so that might have to be an all-or-nothing conversion. But we aren't really going to have much flexibility of planning for subqueries until we do that. > The case of multiple reference is more interesting. > Ideally, we would decide whether to inline just before pull_up_sublinks(), so all > the optimizations can be applied to inlined subquery. But It is impossible as we > have no information to build subquery paths and estimate they costs at this point. TBH, I would just ignore that completely, at least until the simpler case is done and committed. Trying to bite that off as part of the initial patch is likely to lead to never getting anything done at all. And I'm not exactly convinced either that it will win often enough to be worth the trouble, or that users would thank you for rewriting their queries that way. When and if we get to this, you could imagine tackling it a bit like preprocess_minmax_aggregates, or like what I did with OR clauses in <22002.1487099934@sss.pgh.pa.us>, ie just repeat a lot of the work over again to get several complete Path trees, and then pick one. This is kind of expensive and ugly but it's hard to see how to do it sensibly in a bottom-up fashion. regards, tom lane
On 05/09/2017 04:14 PM, Tom Lane wrote: > > Well, TBH that is pre-judging what (if anything) is going to be changed > by a feature that we don't even have design consensus on, let alone a > patch for. I don't think that's an improvement or a good service to > our users; it's just promoting confusion. I think that until we do have > a design and a patch, we're better off leaving the docs reading the way > they have for the past eight years. > > I'm also a bit mystified by the apparent urgency to change something, > anything, in this area when we're already past feature freeze. This > would be a fit subject for discussion several months from now when > v11 development opens, but right now it's just distracting us from > stabilizing v10. > > +1. We should not be in the business of documenting or declaring our intention of having vaporware. Say we get a patch and for some reason it gets rejected. We've had features that took two or three releases to get accepted. This is why we've historically shied away from roadmaps. And we've been bitten in the past by deprecating things only to undeprecate them later. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Ilya Shkuratov wrote > First of all, to such replacement to be valid, the CTE must be > 1. non-writable (e.g. be of form: SELECT ...), > 2. do not use VOLATILE or STABLE functions, > 3. ... (maybe there must be more restrictions?) What about simple things like this? CREATE OR REPLACE FUNCTION z(numeric) RETURNS boolean AS $$ BEGIN RETURN $1 <> 0; END; $$ LANGUAGE plpgSQL IMMUTABLE COST 1000; -- This one works: WITH T AS ( SELECT 1.0 AS v1, 0.0 AS v2 UNION ALL SELECT 3.0, 1.0 UNION ALL SELECT 2.0, 0.0 ), a AS ( SELECT * FROM tWHERE z(v2) ) SELECT * FROM aWHERE v1/v2 > 1.5; -- This one gives 'division by zero': WITH T AS ( SELECT 1.0 AS v1, 0.0 AS v2 UNION ALL SELECT 3.0, 1.0 UNION ALL SELECT 2.0, 0.0 ) SELECT * FROM ( SELECT * FROM t WHERE z(v2) ) AS aWHERE v1/v2 > 1.5; ----- WBR, Yaroslav Schekin. -- View this message in context: http://www.postgresql-archive.org/CTE-inlining-tp5958992p5961086.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
På torsdag 11. mai 2017 kl. 23:37:27, skrev Yaroslav <ladayaroslav@yandex.ru>:
Ilya Shkuratov wrote
> First of all, to such replacement to be valid, the CTE must be
> 1. non-writable (e.g. be of form: SELECT ...),
> 2. do not use VOLATILE or STABLE functions,
> 3. ... (maybe there must be more restrictions?)
What about simple things like this?
CREATE OR REPLACE FUNCTION z(numeric) RETURNS boolean AS $$
BEGIN
RETURN $1 <> 0;
END;
$$ LANGUAGE plpgSQL IMMUTABLE COST 1000;
-- This one works:
WITH T AS (
SELECT 1.0 AS v1, 0.0 AS v2
UNION ALL
SELECT 3.0, 1.0
UNION ALL
SELECT 2.0, 0.0
), a AS (
SELECT *
FROM t
WHERE z(v2)
)
SELECT *
FROM a
WHERE v1/v2 > 1.5;
-- This one gives 'division by zero':
WITH T AS (
SELECT 1.0 AS v1, 0.0 AS v2
UNION ALL
SELECT 3.0, 1.0
UNION ALL
SELECT 2.0, 0.0
)
SELECT *
FROM (
SELECT *
FROM t
WHERE z(v2)
) AS a
WHERE v1/v2 > 1.5;
From a non-hacker;
Just to se what other RDBMS are doing with CTEs; Look at slide 31 here: https://www.percona.com/live/17/sites/default/files/slides/Recursive%20Query%20Throwdown.pdf
PG is not on top wrt. CTE, but could have been if CTEs were not this "established" fence.
+1 for removing this fence and get all the possible optimization we can.
--
Andreas Joseph Krogh
> Just to se what other RDBMS are doing with CTEs; Look at slide > 31 here: https://www.percona.com/live/17/sites/default/files/slides/Recursive%20Query%20Throwdown.pdf That is taken from Markus Winand's post: https://twitter.com/MarkusWinand/status/852862475699707904 "Seems like MySQL is getting the best WITH support of all tested DBs (RECURSIVE not tested!)" -- View this message in context: http://www.postgresql-archive.org/CTE-inlining-tp5958992p5961164.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On Tue, May 9, 2017 at 05:14:19PM -0400, Tom Lane wrote: > Ilya Shkuratov <motr.ilya@ya.ru> writes: > > Ok, it seems that most people in discussion are agree that removing optimization > > fence is a right thing to do. > > Nonetheless I still hoping to discuss the algorithm and its implementation. > > Yeah, so far we've mainly discussed whether to do that and how to control > it, not what the actual results would be. To summarize, it seems we have two options if we want to add fence control to CTEs: 1. add INLINE to disable the CTE fence 2. add MATERIALIZE to enable the CTE fence or some other keywords. I think most people prefer #2 because: * most users writing queries prefer #2 * most users assume full optimization and it seems natural to turn _off_ an optimization via a keyword * while some queries can be inlined, all queries can be materialized, so doing #1 means INLINE would be only a preference,which could be confusing Anyway, I am very glad we are considering addressing this in PG 11. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Fri, May 12, 2017 at 3:39 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Tue, May 9, 2017 at 05:14:19PM -0400, Tom Lane wrote: >> Ilya Shkuratov <motr.ilya@ya.ru> writes: >> > Ok, it seems that most people in discussion are agree that removing optimization >> > fence is a right thing to do. >> > Nonetheless I still hoping to discuss the algorithm and its implementation. >> >> Yeah, so far we've mainly discussed whether to do that and how to control >> it, not what the actual results would be. > > To summarize, it seems we have two options if we want to add fence > control to CTEs: > > 1. add INLINE to disable the CTE fence > 2. add MATERIALIZE to enable the CTE fence > > or some other keywords. I think most people prefer #2 because: > > * most users writing queries prefer #2 Yeah, I think there was rough consensus on this point. I think it's fair to assume that most (or at least a majority) of queries will benefit from inlining, people would want to opt out of, rather than opt in to, generally good optimization strategies. This will hit some in people today, but this is not a backwards compatibility issue since performance is generally not really fairly described as compatibility criteria. If this feature drops we ought to warn people in the release notes though. merlin
From a user's perspective:
>I think most people prefer #2 because:
>* most users writing queries prefer #2
>* most users assume full optimization and it seems natural to turn
> _off_ an optimization via a keyword
>* while some queries can be inlined, all queries can be materialized,
> so doing #1 means INLINE would be only a preference, which could be
> confusing
>* most users writing queries prefer #2
>* most users assume full optimization and it seems natural to turn
> _off_ an optimization via a keyword
>* while some queries can be inlined, all queries can be materialized,
> so doing #1 means INLINE would be only a preference, which could be
> confusing
I completely agree with this reasoning. I have a few queries I would have to touch to add "MATERIALIZED", but the vast majority of CTE's in my codebase would get a speedup. It would allow usage of CTE's more freely than now. I currently avoid them unless it really simplifies a query because of the optimization fence.
Not that my opinion holds any weight, but the extra keyword for enabling the optimization fence is my preference. By default trying to optimize more is a good thing IMO.
>Anyway, I am very glad we are considering addressing this in PG 11.
>Anyway, I am very glad we are considering addressing this in PG 11.
Seconded, this is a sore spot for me when using Postgres, and i'd love to not have it be an issue any more.
Thanks,
-Adam
On 13 May 2017 at 08:39, Bruce Momjian <bruce@momjian.us> wrote: > To summarize, it seems we have two options if we want to add fence > control to CTEs: > > 1. add INLINE to disable the CTE fence > 2. add MATERIALIZE to enable the CTE fence I think #1 is out of the question. What would we do in cases like: WITH INLINE cte AS (SELECT random() a) SELECT * FROM cte UNION SELECT * FROM cte; I assume we won't want to inline when the CTE query contains a volatile function, and we certainly won't in cases like: WITH INLINE cte AS (DELETE FROM a RETURNING *) INSERT INTO b SELECT * from cte WHERE cte.value > 5; We'd be certain to receive complaints from disgruntled users about "Why is this not inlined when I specified INLINE?" #2 does not suffer from that. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services