Thread: Backport "WITH ... AS MATERIALIZED" syntax to <12?
I've been struggling with how we're going to upgrade launchpad.net to PostgreSQL 12 or newer (we're currently on 10). We're one of those applications that deliberately uses CTEs as optimization fences in a few difficult places. The provision of the MATERIALIZED keyword in 12 is great, but the fact that it doesn't exist in earlier versions is awkward. We certainly don't want to upgrade our application code at the same time as upgrading the database, and dealing with performance degradation between the database upgrade and the application upgrade doesn't seem great either (not to mention that it would be hard to coordinate). That leaves us with hacking our query compiler to add the MATERIALIZED keyword only if it's running on 12 or newer, which would be possible but is pretty cumbersome. However, an alternative would be to backport the new syntax to some earlier versions. "WITH ... AS MATERIALIZED" can easily just be synonymous with "WITH ... AS" in versions prior to 12; there's no need to support "NOT MATERIALIZED" since that's explicitly requesting the new query-folding feature that only exists in 12. Would something like the attached patch against REL_11_STABLE be acceptable? I'd like to backpatch it at least as far as PostgreSQL 10. This compiles and passes regression tests. Thanks, -- Colin Watson [cjwatson@canonical.com]
Attachment
On Fri, Oct 18, 2019 at 02:21:30PM +0100, Colin Watson wrote: > However, an alternative would be to backport the new syntax to some > earlier versions. "WITH ... AS MATERIALIZED" can easily just be > synonymous with "WITH ... AS" in versions prior to 12; there's no need > to support "NOT MATERIALIZED" since that's explicitly requesting the new > query-folding feature that only exists in 12. Would something like the > attached patch against REL_11_STABLE be acceptable? I'd like to > backpatch it at least as far as PostgreSQL 10. I am afraid that new features don't gain a backpatch. This is a project policy. Back-branches should just include bug fixes. -- Michael
Attachment
>>>>> "Michael" == Michael Paquier <michael@paquier.xyz> writes: > On Fri, Oct 18, 2019 at 02:21:30PM +0100, Colin Watson wrote: >> However, an alternative would be to backport the new syntax to some >> earlier versions. "WITH ... AS MATERIALIZED" can easily just be >> synonymous with "WITH ... AS" in versions prior to 12; there's no >> need to support "NOT MATERIALIZED" since that's explicitly >> requesting the new query-folding feature that only exists in 12. >> Would something like the attached patch against REL_11_STABLE be >> acceptable? I'd like to backpatch it at least as far as PostgreSQL >> 10. Michael> I am afraid that new features don't gain a backpatch. This is Michael> a project policy. Back-branches should just include bug fixes. I do think an argument can be made for making an exception in this particular case. This wouldn't be backpatching a feature, just accepting and ignoring some of the new syntax to make upgrading easier. -- Andrew (irc:RhodiumToad)
On Sat, Oct 19, 2019 at 05:01:04AM +0100, Andrew Gierth wrote: > >>>>> "Michael" == Michael Paquier <michael@paquier.xyz> writes: > > On Fri, Oct 18, 2019 at 02:21:30PM +0100, Colin Watson wrote: > >> However, an alternative would be to backport the new syntax to some > >> earlier versions. "WITH ... AS MATERIALIZED" can easily just be > >> synonymous with "WITH ... AS" in versions prior to 12; there's no > >> need to support "NOT MATERIALIZED" since that's explicitly > >> requesting the new query-folding feature that only exists in 12. > >> Would something like the attached patch against REL_11_STABLE be > >> acceptable? I'd like to backpatch it at least as far as PostgreSQL > >> 10. > > Michael> I am afraid that new features don't gain a backpatch. This is > Michael> a project policy. Back-branches should just include bug fixes. > > I do think an argument can be made for making an exception in this > particular case. This wouldn't be backpatching a feature, just accepting > and ignoring some of the new syntax to make upgrading easier. Right, this is my position too. I'm explicitly not asking for backpatching of the CTE-inlining feature, just trying to cope with the fact that we now have to spell some particular queries differently to retain the performance characteristics we need for them. I suppose an alternative would be to add a configuration option to 12 that allows disabling inlining of CTEs cluster-wide: we could then upgrade to 12 with inlining disabled, add MATERIALIZED to the relevant queries, and then re-enable inlining. But I like that less because it would end up leaving cruft around in PostgreSQL's configuration code somewhat indefinitely for the sake of an edge case in upgrading to a particular version. -- Colin Watson [cjwatson@canonical.com]
Hi, On October 19, 2019 6:01:04 AM GMT+02:00, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: >>>>>> "Michael" == Michael Paquier <michael@paquier.xyz> writes: > > > On Fri, Oct 18, 2019 at 02:21:30PM +0100, Colin Watson wrote: > >> However, an alternative would be to backport the new syntax to some > >> earlier versions. "WITH ... AS MATERIALIZED" can easily just be > >> synonymous with "WITH ... AS" in versions prior to 12; there's no > >> need to support "NOT MATERIALIZED" since that's explicitly > >> requesting the new query-folding feature that only exists in 12. > >> Would something like the attached patch against REL_11_STABLE be > >> acceptable? I'd like to backpatch it at least as far as PostgreSQL > >> 10. > > Michael> I am afraid that new features don't gain a backpatch. This is >Michael> a project policy. Back-branches should just include bug fixes. > >I do think an argument can be made for making an exception in this >particular case. This wouldn't be backpatching a feature, just >accepting >and ignoring some of the new syntax to make upgrading easier. +1 -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Sat, Oct 19, 2019 at 11:56:56AM +0200, Andres Freund wrote: >Hi, > >On October 19, 2019 6:01:04 AM GMT+02:00, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: >>>>>>> "Michael" == Michael Paquier <michael@paquier.xyz> writes: >> >> > On Fri, Oct 18, 2019 at 02:21:30PM +0100, Colin Watson wrote: >> >> However, an alternative would be to backport the new syntax to some >> >> earlier versions. "WITH ... AS MATERIALIZED" can easily just be >> >> synonymous with "WITH ... AS" in versions prior to 12; there's no >> >> need to support "NOT MATERIALIZED" since that's explicitly >> >> requesting the new query-folding feature that only exists in 12. >> >> Would something like the attached patch against REL_11_STABLE be >> >> acceptable? I'd like to backpatch it at least as far as PostgreSQL >> >> 10. >> >> Michael> I am afraid that new features don't gain a backpatch. This is >>Michael> a project policy. Back-branches should just include bug fixes. >> >>I do think an argument can be made for making an exception in this >>particular case. This wouldn't be backpatching a feature, just >>accepting >>and ignoring some of the new syntax to make upgrading easier. > >+1 > +0.5 In general, I'm not opposed to accepting and ignoring the MATERIALIZED syntax (assuming we'd only accept AS MATERIALIZED, but not the negative variant). FWIW I'm not sure the "we don't want to upgrade application code at the same time as the database" is really tenable. I don't think we really promise that anywhere, and adding the AS MATERIALIZED seems quite mechanical. I think we could find cases where we caused worse breaks between major versions. One disadvantage is that this will increase confusion for users, who'll get used to the behavior on 12, and then they'll get confused on older releases (e.g. if you don't specify AS MATERIALIZED you'd expect the CTE to get inlined, but that won't happen). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Oct 19, 2019 at 10:22:39AM +0100, Colin Watson wrote: >On Sat, Oct 19, 2019 at 05:01:04AM +0100, Andrew Gierth wrote: >> >>>>> "Michael" == Michael Paquier <michael@paquier.xyz> writes: >> > On Fri, Oct 18, 2019 at 02:21:30PM +0100, Colin Watson wrote: >> >> However, an alternative would be to backport the new syntax to some >> >> earlier versions. "WITH ... AS MATERIALIZED" can easily just be >> >> synonymous with "WITH ... AS" in versions prior to 12; there's no >> >> need to support "NOT MATERIALIZED" since that's explicitly >> >> requesting the new query-folding feature that only exists in 12. >> >> Would something like the attached patch against REL_11_STABLE be >> >> acceptable? I'd like to backpatch it at least as far as PostgreSQL >> >> 10. >> >> Michael> I am afraid that new features don't gain a backpatch. This is >> Michael> a project policy. Back-branches should just include bug fixes. >> >> I do think an argument can be made for making an exception in this >> particular case. This wouldn't be backpatching a feature, just accepting >> and ignoring some of the new syntax to make upgrading easier. > >Right, this is my position too. I'm explicitly not asking for >backpatching of the CTE-inlining feature, just trying to cope with the >fact that we now have to spell some particular queries differently to >retain the performance characteristics we need for them. > >I suppose an alternative would be to add a configuration option to 12 >that allows disabling inlining of CTEs cluster-wide: we could then >upgrade to 12 with inlining disabled, add MATERIALIZED to the relevant >queries, and then re-enable inlining. But I like that less because it >would end up leaving cruft around in PostgreSQL's configuration code >somewhat indefinitely for the sake of an edge case in upgrading to a >particular version. I think having a GUC option was proposed and discussed while developping the feature, and it was rejected - one of the reasons being experience with similar GUCs in the past, which essentially just allowed users to postpone the fix indefinitely, and increased our maintenance burden. I wonder if an extension could do something like that, though. It can install a hook after parse analysis, so I guess it could walk the CTEs and mark them as materialized. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 10/19/19 6:48 AM, Tomas Vondra wrote: > On Sat, Oct 19, 2019 at 11:56:56AM +0200, Andres Freund wrote: >> Hi, >> >> On October 19, 2019 6:01:04 AM GMT+02:00, Andrew Gierth >> <andrew@tao11.riddles.org.uk> wrote: >>>>>>>> "Michael" == Michael Paquier <michael@paquier.xyz> writes: >>> >>> > On Fri, Oct 18, 2019 at 02:21:30PM +0100, Colin Watson wrote: >>> >> However, an alternative would be to backport the new syntax to some >>> >> earlier versions. "WITH ... AS MATERIALIZED" can easily just be >>> >> synonymous with "WITH ... AS" in versions prior to 12; there's no >>> >> need to support "NOT MATERIALIZED" since that's explicitly >>> >> requesting the new query-folding feature that only exists in 12. >>> >> Would something like the attached patch against REL_11_STABLE be >>> >> acceptable? I'd like to backpatch it at least as far as PostgreSQL >>> >> 10. >>> >>> Michael> I am afraid that new features don't gain a backpatch. This is >>> Michael> a project policy. Back-branches should just include bug fixes. >>> >>> I do think an argument can be made for making an exception in this >>> particular case. This wouldn't be backpatching a feature, just >>> accepting >>> and ignoring some of the new syntax to make upgrading easier. >> >> +1 >> > > +0.5 > > In general, I'm not opposed to accepting and ignoring the MATERIALIZED > syntax (assuming we'd only accept AS MATERIALIZED, but not the negative > variant). > > FWIW I'm not sure the "we don't want to upgrade application code at the > same time as the database" is really tenable. I'm -1 for exactly this reason. In any case, if you insist on using the same code with pre-12 and post-12 releases, this should be achievable (at least in most cases) by using the "offset 0" trick, shouldn't it? cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, 19 Oct 2019 at 10:53, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:
> In general, I'm not opposed to accepting and ignoring the MATERIALIZED
> syntax (assuming we'd only accept AS MATERIALIZED, but not the negative
> variant).
>
> FWIW I'm not sure the "we don't want to upgrade application code at the
> same time as the database" is really tenable.
I'm -1 for exactly this reason.
In any case, if you insist on using the same code with pre-12 and
post-12 releases, this should be achievable (at least in most cases) by
using the "offset 0" trick, shouldn't it?
That embeds a temporary hack in the application code indefinitely.
If only we had Guido's (Python) time machine. We could go back and start accepting "AS MATERIALIZED" as noise words starting from version 7 or something.
On 10/19/19 11:10 AM, Isaac Morland wrote: > On Sat, 19 Oct 2019 at 10:53, Andrew Dunstan > <andrew.dunstan@2ndquadrant.com > <mailto:andrew.dunstan@2ndquadrant.com>> wrote: > > > > In general, I'm not opposed to accepting and ignoring the > MATERIALIZED > > syntax (assuming we'd only accept AS MATERIALIZED, but not the > negative > > variant). > > > > FWIW I'm not sure the "we don't want to upgrade application code > at the > > same time as the database" is really tenable. > > I'm -1 for exactly this reason. > > In any case, if you insist on using the same code with pre-12 and > post-12 releases, this should be achievable (at least in most > cases) by > using the "offset 0" trick, shouldn't it? > > > That embeds a temporary hack in the application code indefinitely. > > If only we had Guido's (Python) time machine. We could go back and > start accepting "AS MATERIALIZED" as noise words starting from version > 7 or something. let me know when that's materialized :-) cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Greetings, * Isaac Morland (isaac.morland@gmail.com) wrote: > That embeds a temporary hack in the application code indefinitely. ... one could argue the same about having to say AS MATERIALIZED. Thanks, Stephen
Attachment
On Sat, 19 Oct 2019 at 13:36, Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* Isaac Morland (isaac.morland@gmail.com) wrote:
> That embeds a temporary hack in the application code indefinitely.
... one could argue the same about having to say AS MATERIALIZED.
I think OFFSET 0 is a hack - the fact that it forces an optimization fence feels like an oddity. By contrast, saying AS MATERIALIZED means materialize the CTE. I suppose you could argue that the need to be able to request that is a temporary hack until query optimization improves further, but I don't think that's realistic. For the foreseeable future we will need to be able to tell the query planner that it is wrong. I mean, in principle the DB should figure out for itself which (non-constraint) indexes are needed. But I don't see any proposals to attempt to implement that.
Side note: I am frequently disappointed by the query planner. I have had many situations in which a nice simple strategy of looking up some tiny number of records in an index and then following more indices to get joined records would have worked, but instead it did a linear scan through the wrong starting table. So I'm very glad the AS MATERIALIZED now exists for when it's needed. On the other hand, I recognize that the reason I'm disappointed is because my expectations are so high: often I've written a query that joins several views together, meaning that under the covers it's really joining maybe 20 tables, and it comes back with the answer instantly. So in effect the query planner is just good enough to make me expect it to be even better than it is.
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > +0.5 > In general, I'm not opposed to accepting and ignoring the MATERIALIZED > syntax (assuming we'd only accept AS MATERIALIZED, but not the negative > variant). FWIW, I'm +0.1 or thereabouts. I'd vote -1 if the patch required introducing a new lexer keyword (even an unreserved one); but it doesn't. So it's hard to argue that there's much downside. (If we do this, I wonder if we should make the back branches parse NOT MATERIALIZED as well, and then throw a "not implemented" error rather than the unhelpful syntax error you'd get today.) (Also, if we do this, I think we should patch all supported branches. The OP's proposal to patch back to 10 has no foundation that I can see.) > FWIW I'm not sure the "we don't want to upgrade application code at the > same time as the database" is really tenable. I don't think we really > promise that anywhere, and adding the AS MATERIALIZED seems quite > mechanical. I think we could find cases where we caused worse breaks > between major versions. That's certainly true, which is why I'm only lukewarm about the proposal. > One disadvantage is that this will increase confusion for users, who'll > get used to the behavior on 12, and then they'll get confused on older > releases (e.g. if you don't specify AS MATERIALIZED you'd expect the CTE > to get inlined, but that won't happen). I'm less concerned about that aspect than about the aspect of (for instance) 11.6 and up allowing a syntax that 11.0-11.5 don't. People are likely to write code relying on this and then be surprised when it doesn't work on a slightly older server. Still, is that so much different from cases where we fix a bug that prevented some statement from working? regards, tom lane
+1 for the configuration option. Otherwise, migration is a nightmare -- so many CTEs were written specifically to use the "optimization fence" behavior. The lack of such configuration options is now a "migration fence".
On Sat, Oct 19, 2019 at 2:49 AM Colin Watson <cjwatson@canonical.com> wrote:
On Sat, Oct 19, 2019 at 05:01:04AM +0100, Andrew Gierth wrote:
> >>>>> "Michael" == Michael Paquier <michael@paquier.xyz> writes:
> > On Fri, Oct 18, 2019 at 02:21:30PM +0100, Colin Watson wrote:
> >> However, an alternative would be to backport the new syntax to some
> >> earlier versions. "WITH ... AS MATERIALIZED" can easily just be
> >> synonymous with "WITH ... AS" in versions prior to 12; there's no
> >> need to support "NOT MATERIALIZED" since that's explicitly
> >> requesting the new query-folding feature that only exists in 12.
> >> Would something like the attached patch against REL_11_STABLE be
> >> acceptable? I'd like to backpatch it at least as far as PostgreSQL
> >> 10.
>
> Michael> I am afraid that new features don't gain a backpatch. This is
> Michael> a project policy. Back-branches should just include bug fixes.
>
> I do think an argument can be made for making an exception in this
> particular case. This wouldn't be backpatching a feature, just accepting
> and ignoring some of the new syntax to make upgrading easier.
Right, this is my position too. I'm explicitly not asking for
backpatching of the CTE-inlining feature, just trying to cope with the
fact that we now have to spell some particular queries differently to
retain the performance characteristics we need for them.
I suppose an alternative would be to add a configuration option to 12
that allows disabling inlining of CTEs cluster-wide: we could then
upgrade to 12 with inlining disabled, add MATERIALIZED to the relevant
queries, and then re-enable inlining. But I like that less because it
would end up leaving cruft around in PostgreSQL's configuration code
somewhat indefinitely for the sake of an edge case in upgrading to a
particular version.
--
Colin Watson [cjwatson@canonical.com]
On Sat, Oct 19, 2019 at 8:11 AM Isaac Morland <isaac.morland@gmail.com> wrote:
That embeds a temporary hack in the application code indefinitely.
Or postpone the migration indefinitely. I saw so many times how migration in large companies was postponed because of similar "small" issues -- when the code base is large, it's easier for managers to say something like "no, we will better live without cool new features for a couple of more years than put our systems at risk due to lack of testing".
Nobody invented an excellent way to test production workloads in non-production environments yet. I know it very well because I'm also working in this direction for a couple of years. So this feature (a great one) seems to me as a major roadblock for DBAs and developers who would like to migrate to 12 and have better performance and all the new features. Ironically, including this one for the new or the updated code!
If you need to patch all your code adding "AS MATERIALIZED" (and you will need it if you want to minimize risks of performance degradation after the upgrade), then it's also a temporary hack. But much, much more expensive in implementation in large projects, and sometimes even not possible.
I do think that the lack of this configuration option will prevent some projects from migration for a long time.
Correct me if I'm wrong. Maybe somebody already thought about migration options here and have good answers? What is the best way to upgrade if you have dozens of multi-terabyte databases, a lot of legacy code and workloads where 1 minute of downtime or even performance degradation would cost a lot of money so it is not acceptable? What would be the good answers here?
On Sat, Oct 19, 2019 at 10:53 AM Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > > FWIW I'm not sure the "we don't want to upgrade application code at the > > same time as the database" is really tenable. > > I'm -1 for exactly this reason. -1 from me, too, also for this reason. I bet if we started looking we'd find many changes every year that we could justify partially or completely back-porting on similar grounds, and if we start doing that, we'll certainly screw it up sometimes, turning what should have been a smooth minor release upgrade process into one that breaks. And we'll still not satisfy the people who don't want to upgrade the application and the database at the same time, because there will always be changes where nothing like this is remotely reasonable. Also, we'll then have a lot more behavior differences between minor releases, which sounds like a bad thing to me. In particular, nobody will be happy if a pg_dump taken on version X.Y fails to restore on version X.Z. But even apart from that, it just doesn't sound like a good idea to have the user-facing behavior vary significantly across minor releases... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Oct 19, 2019 at 02:35:42PM -0400, Isaac Morland wrote: > On Sat, 19 Oct 2019 at 13:36, Stephen Frost <sfrost@snowman.net> wrote: > > Greetings, > > * Isaac Morland (isaac.morland@gmail.com) wrote: > > That embeds a temporary hack in the application code indefinitely. > > ... one could argue the same about having to say AS MATERIALIZED. > > > I think OFFSET 0 is a hack - the fact that it forces an optimization fence > feels like an oddity. By contrast, saying AS MATERIALIZED means materialize the > CTE. I suppose you could argue that the need to be able to request that is a > temporary hack until query optimization improves further, but I don't think > that's realistic. For the foreseeable future we will need to be able to tell > the query planner that it is wrong. I mean, in principle the DB should figure > out for itself which (non-constraint) indexes are needed. But I don't see any > proposals to attempt to implement that. > > Side note: I am frequently disappointed by the query planner. I have had many > situations in which a nice simple strategy of looking up some tiny number of > records in an index and then following more indices to get joined records would > have worked, but instead it did a linear scan through the wrong starting table. > So I'm very glad the AS MATERIALIZED now exists for when it's needed. On the > other hand, I recognize that the reason I'm disappointed is because my > expectations are so high: often I've written a query that joins several views > together, meaning that under the covers it's really joining maybe 20 tables, > and it comes back with the answer instantly. So in effect the query planner is > just good enough to make me expect it to be even better than it is. Well, since geqo_threshold = 12 is the default, for a 20-table join, you are using genetic query optimization (GEQO) in PG 12 without MATERIALIZED: https://www.postgresql.org/docs/12/geqo.html and GEQO assumes it would take too long to fully test all optimization possibilities, so it randomly checks just some of them. Therefore, it is no surprise you are disappointed in its output. In a way, when you are using materialized CTEs, you are doing the optimization yourself, in your SQL code, and then the table join count drops low enough that GEQO is not used and Postgres fully tests all optimization possibilities. This is behavior I had never considered --- the idea that the user is partly replacing the optimizer, and that using materialized CTEs prevents the problems that require the use of GEQO. I guess my big take-away is that not only can MATERIALIZE change the quality of query plans but it can also improve the quality of query plans if it prevents GEQO from being used. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Sat, Oct 19, 2019 at 11:52 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > I wonder if an extension could do something like that, though. It can > install a hook after parse analysis, so I guess it could walk the CTEs > and mark them as materialized. I wonder if the existing pg_hint_plan extension could be extended to do that using something like /*+ MATERIALIZE */. That'd presumably be ignored when not installed/not understood etc. I've never used pg_hint_plan myself and don't know how or how well it works, but it look like it supports Oracle-style hints hidden in comments like /*+ HashJoin(a b) */ rather than SQL Server-style hints that are in the SQL grammar itself like SELECT ... FROM a HASH JOIN b.