Thread: WIP: pg_pretty_query
Hello last year we are spoke about reusing pretty print view code for some queries. Here is patch: this patch is really short - it is nice. But - it works only with known database objects (probably we would it) and it doesn't format subqueries well postgres=# select pg_pretty_query('select x.*, z.* from foo, foo x, x z where x.a = 10 and x.a = 30 and EXISTS(SELECT * FROM foo WHERE a = z.a)', true, false); pg_pretty_query ---------------------------------------------------------- SELECT x.a, z.a + FROM foo, foo x, x z + WHERE x.a = 10 AND x.a = 30 AND (EXISTS ( SELECT foo.a+ FROM foo + WHERE foo.a = z.a)) (1 row) Regards Pavel
Attachment
On 7 August 2012 15:14, Pavel Stehule <pavel.stehule@gmail.com> wrote: > Hello > > last year we are spoke about reusing pretty print view code for some queries. > > Here is patch: > > this patch is really short - it is nice. But - it works only with > known database objects (probably we would it) and it doesn't format > subqueries well > > > postgres=# select pg_pretty_query('select x.*, z.* from foo, foo x, x > z where x.a = 10 and x.a = 30 and EXISTS(SELECT * FROM foo WHERE a = > z.a)', true, false); > pg_pretty_query > ---------------------------------------------------------- > SELECT x.a, z.a + > FROM foo, foo x, x z + > WHERE x.a = 10 AND x.a = 30 AND (EXISTS ( SELECT foo.a+ > FROM foo + > WHERE foo.a = z.a)) > (1 row) This looks odd: postgres=# SELECT pg_pretty_query('SELECT 1, (SELECT max(a.x) + greatest(2,3) FROM generate_series(4,10,2) a(x)) FROM generate_series(1,100) GROUP BY 1 ORDER BY 1, 2 USING < NULLS FIRST', true, false); pg_pretty_query ------------------------------------------------------------------ SELECT 1, + ( SELECT max(a.x) + GREATEST(2, 3) + FROM generate_series(4, 10, 2) a(x)) + FROM generate_series(1, 100) generate_series(generate_series)+ GROUP BY 1::integer + ORDER BY 1::integer, ( SELECT max(a.x) + GREATEST(2, 3) + FROM generate_series(4,10, 2) a(x)) NULLS FIRST (1 row) USING < is removed completely (or if I used DESC, NULLS FIRST is then removed instead), "2" in the order by is expanded to its full query, and generate_series when used in FROM is repeated with its own name as a parameter. I'm also not sure about the spacing before each line. SELECT, FROM and GROUP BY all appear out of alignment from one another. Plus it would be nice if we could support something like the following style: SELECT field_one, field_two + field_three FROM my_table INNER JOIN another_table ON my_table.field_one = another_table.another_field AND another_table.valid =true WHERE field_one > 3 AND field_two < 10; But that's just a nice-to-have. -- Thom
2012/8/7 Thom Brown <thom@linux.com>: > On 7 August 2012 15:14, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> Hello >> >> last year we are spoke about reusing pretty print view code for some queries. >> >> Here is patch: >> >> this patch is really short - it is nice. But - it works only with >> known database objects (probably we would it) and it doesn't format >> subqueries well >> >> >> postgres=# select pg_pretty_query('select x.*, z.* from foo, foo x, x >> z where x.a = 10 and x.a = 30 and EXISTS(SELECT * FROM foo WHERE a = >> z.a)', true, false); >> pg_pretty_query >> ---------------------------------------------------------- >> SELECT x.a, z.a + >> FROM foo, foo x, x z + >> WHERE x.a = 10 AND x.a = 30 AND (EXISTS ( SELECT foo.a+ >> FROM foo + >> WHERE foo.a = z.a)) >> (1 row) > > This looks odd: > > postgres=# SELECT pg_pretty_query('SELECT 1, (SELECT max(a.x) + > greatest(2,3) FROM generate_series(4,10,2) a(x)) FROM > generate_series(1,100) GROUP BY 1 ORDER BY 1, 2 USING < NULLS FIRST', > true, false); > pg_pretty_query > ------------------------------------------------------------------ > SELECT 1, + > ( SELECT max(a.x) + GREATEST(2, 3) + > FROM generate_series(4, 10, 2) a(x)) + > FROM generate_series(1, 100) generate_series(generate_series)+ > GROUP BY 1::integer + > ORDER BY 1::integer, ( SELECT max(a.x) + GREATEST(2, 3) + > FROM generate_series(4, 10, 2) a(x)) NULLS FIRST > (1 row) > > USING < is removed completely (or if I used DESC, NULLS FIRST is then > removed instead), "2" in the order by is expanded to its full query, > and generate_series when used in FROM is repeated with its own name as > a parameter. I'm also not sure about the spacing before each line. > SELECT, FROM and GROUP BY all appear out of alignment from one > another. it is issue - probably we can start deserialization just from parser stage, not from rewriter stage - but then code will be significantly longer and we cannot reuse current code for pretty print view. > > Plus it would be nice if we could support something like the following style: > > SELECT > field_one, > field_two + field_three > FROM > my_table > INNER JOIN > another_table > ON > my_table.field_one = another_table.another_field > AND > another_table.valid = true > WHERE > field_one > 3 > AND > field_two < 10; > it is second issue - probably there are more lovely styles - CELKO, your and other. I am not sure if we can support more styles in core (contrib should be better maybe). Regards Pavel > But that's just a nice-to-have. > -- > Thom
On Tue, Aug 7, 2012 at 04:14:34PM +0200, Pavel Stehule wrote: > Hello > > last year we are spoke about reusing pretty print view code for some queries. > > Here is patch: > > this patch is really short - it is nice. But - it works only with > known database objects (probably we would it) and it doesn't format > subqueries well > > > postgres=# select pg_pretty_query('select x.*, z.* from foo, foo x, x > z where x.a = 10 and x.a = 30 and EXISTS(SELECT * FROM foo WHERE a = > z.a)', true, false); > pg_pretty_query > ---------------------------------------------------------- > SELECT x.a, z.a + > FROM foo, foo x, x z + > WHERE x.a = 10 AND x.a = 30 AND (EXISTS ( SELECT foo.a+ > FROM foo + > WHERE foo.a = z.a)) > (1 row) I can see this as very useful for people reporting badly-formatted queries to our email lists. Great! -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 08/07/2012 10:14 AM, Pavel Stehule wrote: > Hello > > last year we are spoke about reusing pretty print view code for some queries. > > Here is patch: > > this patch is really short - it is nice. But - it works only with > known database objects (probably we would it) and it doesn't format > subqueries well > > > postgres=# select pg_pretty_query('select x.*, z.* from foo, foo x, x > z where x.a = 10 and x.a = 30 and EXISTS(SELECT * FROM foo WHERE a = > z.a)', true, false); > pg_pretty_query > ---------------------------------------------------------- > SELECT x.a, z.a + > FROM foo, foo x, x z + > WHERE x.a = 10 AND x.a = 30 AND (EXISTS ( SELECT foo.a+ > FROM foo + > WHERE foo.a = z.a)) > (1 row) Good stuff. That's one less item on my TODO list :-) I think we should have a version that lets you specify the wrap column, like pg_get_viewdef does. Possibly for this case we should even default it to 0 (wrap after each item) instead of 79 which it is for views. cheers andrew
On Tue, Aug 07, 2012 at 04:54:12PM +0200, Pavel Stehule wrote: > 2012/8/7 Thom Brown <thom@linux.com>: > > On 7 August 2012 15:14, Pavel Stehule <pavel.stehule@gmail.com> wrote: > >> Hello > >> > >> last year we are spoke about reusing pretty print view code for some queries. > >> > >> Here is patch: > >> > >> this patch is really short - it is nice. But - it works only with > >> known database objects (probably we would it) and it doesn't format > >> subqueries well > >> > >> > >> postgres=# select pg_pretty_query('select x.*, z.* from foo, foo x, x > >> z where x.a = 10 and x.a = 30 and EXISTS(SELECT * FROM foo WHERE a = > >> z.a)', true, false); > >> pg_pretty_query > >> ---------------------------------------------------------- > >> SELECT x.a, z.a + > >> FROM foo, foo x, x z + > >> WHERE x.a = 10 AND x.a = 30 AND (EXISTS ( SELECT foo.a+ > >> FROM foo + > >> WHERE foo.a = z.a)) > >> (1 row) > > > > This looks odd: > > > > postgres=# SELECT pg_pretty_query('SELECT 1, (SELECT max(a.x) + > > greatest(2,3) FROM generate_series(4,10,2) a(x)) FROM > > generate_series(1,100) GROUP BY 1 ORDER BY 1, 2 USING < NULLS FIRST', > > true, false); > > pg_pretty_query > > ------------------------------------------------------------------ > > SELECT 1, + > > ( SELECT max(a.x) + GREATEST(2, 3) + > > FROM generate_series(4, 10, 2) a(x)) + > > FROM generate_series(1, 100) generate_series(generate_series)+ > > GROUP BY 1::integer + > > ORDER BY 1::integer, ( SELECT max(a.x) + GREATEST(2, 3) + > > FROM generate_series(4, 10, 2) a(x)) NULLS FIRST > > (1 row) > > > > USING < is removed completely (or if I used DESC, NULLS FIRST is then > > removed instead), "2" in the order by is expanded to its full query, > > and generate_series when used in FROM is repeated with its own name as > > a parameter. I'm also not sure about the spacing before each line. > > SELECT, FROM and GROUP BY all appear out of alignment from one > > another. > > it is issue - probably we can start deserialization just from parser > stage, not from rewriter stage - but then code will be significantly > longer and we cannot reuse current code for pretty print view. > > > > > Plus it would be nice if we could support something like the following style: > > > > SELECT > > field_one, > > field_two + field_three > > FROM > > my_table > > INNER JOIN > > another_table > > ON > > my_table.field_one = another_table.another_field > > AND > > another_table.valid = true > > WHERE > > field_one > 3 > > AND > > field_two < 10; > > > > it is second issue - probably there are more lovely styles - CELKO, > your and other. I am not sure if we can support more styles in core > (contrib should be better maybe). Would it be better to have output plugins and not privilege one? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Bruce Momjian <bruce@momjian.us> writes: > On Tue, Aug 7, 2012 at 04:14:34PM +0200, Pavel Stehule wrote: >> last year we are spoke about reusing pretty print view code for some queries. >> >> Here is patch: > I can see this as very useful for people reporting badly-formatted > queries to our email lists. Great! Allow me to express a contrary opinion: I think this is a bad idea. * First off, packaging it as a SQL function that takes and returns text seems rather awkward to use. A lot of places where you might wish for a SQL pretty-printer aren't going to have a database connection handy (think "emacs SQL mode"). * The functionality provided is not merely a pretty-printer but sort of a query validator as well: the function will fail if the query refers to any tables, columns, functions, etc you don't have in your database. For some applications that's fine, but others not so much --- in particular I suspect it's nigh useless for the use-case you mention of quickly turning an emailed query into something legible. And there's no way to separate out the reformatting functionality from that. * As per some of the complaints already registered in this thread, ruleutils.c is not designed with the goal of being a pretty-printer. Its primary charter is to support pg_dump by regurgitating rules/views in an unambiguous form, which does not necessarily look very similar to what was entered. An example of a transformation that probably nobody would want in a typical pretty-printing context is expansion of "SELECT *" lists. But again, there is really no way to turn that off. Another aspect that seems pretty undesirable for pretty-printing is loss of any comments embedded in the query text. I'm very much not in favor of trying to make ruleutils serve two masters, but that's the game we will be playing if we accept this patch. In short, the only redeeming value of this patch is that it's short. The functionality it provides is not something that anyone would come up with in a green-field design for a pretty-printer, and if we take it we are going to be faced with a whole lot of redesign requests that will be painful to implement and will carry heavy risks of breaking pg_dump and/or EXPLAIN. regards, tom lane
On 08/07/2012 02:14 PM, Tom Lane wrote: > * As per some of the complaints already registered in this thread, > ruleutils.c is not designed with the goal of being a pretty-printer. > Its primary charter is to support pg_dump by regurgitating rules/views > in an unambiguous form, which does not necessarily look very similar to > what was entered. An example of a transformation that probably nobody > would want in a typical pretty-printing context is expansion of > "SELECT *" lists. But again, there is really no way to turn that off. > Another aspect that seems pretty undesirable for pretty-printing is > loss of any comments embedded in the query text. > > I'm very much not in favor of trying to make ruleutils serve two > masters, but that's the game we will be playing if we accept this patch. I think this horse has probably bolted. If you wanted to segregate off this functionality we shouldn't have used things like pg_get_viewdef in psql, ISTM. > > In short, the only redeeming value of this patch is that it's short. > The functionality it provides is not something that anyone would come > up with in a green-field design for a pretty-printer, and if we take > it we are going to be faced with a whole lot of redesign requests that > will be painful to implement and will carry heavy risks of breaking > pg_dump and/or EXPLAIN. > > One of the challenges is to have a pretty printer that is kept in sync with the dialect that's supported. Anything that doesn't use the backend's parser seems to me to be guaranteed to get out of sync very quickly. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 08/07/2012 02:14 PM, Tom Lane wrote: >> In short, the only redeeming value of this patch is that it's short. > One of the challenges is to have a pretty printer that is kept in sync > with the dialect that's supported. Anything that doesn't use the > backend's parser seems to me to be guaranteed to get out of sync very > quickly. Sure. I think if we wanted an actually engineered solution, rather than a half-baked one, ecpg provides a good source of inspiration. One could imagine a standalone program that reads a query on stdin and emits a pretty-printed version to stdout, using a parser that is automatically generated from the backend's grammar with much the same technology used in recent ecpg releases. I think that would address most of the complaints I raised: it would be relatively painless to make use of from contexts that don't have a live database connection, it wouldn't impose any constraints related to having suitable database content available, it wouldn't apply any of the multitude of implementation-dependent transformations that the backend's parser does, and it could be built (I think) to do something more with comments than just throw them away. regards, tom lane
On 7 August 2012 20:01, Andrew Dunstan <andrew@dunslane.net> wrote: > On 08/07/2012 02:14 PM, Tom Lane wrote: >> * As per some of the complaints already registered in this thread, >> ruleutils.c is not designed with the goal of being a pretty-printer. >> Its primary charter is to support pg_dump by regurgitating rules/views >> in an unambiguous form, which does not necessarily look very similar to >> what was entered. An example of a transformation that probably nobody >> would want in a typical pretty-printing context is expansion of >> "SELECT *" lists. But again, there is really no way to turn that off. >> Another aspect that seems pretty undesirable for pretty-printing is >> loss of any comments embedded in the query text. >> >> I'm very much not in favor of trying to make ruleutils serve two >> masters, but that's the game we will be playing if we accept this patch. +1. A pretty-printer that makes the query to be cleaned-up actually undergo parse-analysis seems misconceived to me. This is a tool that begs to be written in something like Python, as a satellite project, with much greater flexibility in the format of the SQL that it outputs. > One of the challenges is to have a pretty printer that is kept in sync with > the dialect that's supported. Anything that doesn't use the backend's parser > seems to me to be guaranteed to get out of sync very quickly. I'm not convinced of that. Consider the example of cscope, a popular tool for browsing C code. Its parser was written to be "fuzzy", so it's actually perfectly usable for C++ and Java, even though that isn't actually supported, IIRC. Now, I'll grant you that that isn't a perfectly analogous situation, but it is similar in some ways. If we add a new clause to select and that bit is generically pretty-printed, is that really so bad? I have a hard time believing that a well written fuzzy pretty-printer for Postgres wouldn't deliver the vast majority of the benefits of an in-core approach, at a small fraction of the effort. You'd also be able to pretty-print plpgsql function definitions (a particularly compelling case for this kind of tool), which any approach based on the backends grammar will never be able to do (except, perhaps, if you were to do something even more complicated). -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Peter Geoghegan <peter@2ndquadrant.com> writes: > On 7 August 2012 20:01, Andrew Dunstan <andrew@dunslane.net> wrote: >> One of the challenges is to have a pretty printer that is kept in sync with >> the dialect that's supported. Anything that doesn't use the backend's parser >> seems to me to be guaranteed to get out of sync very quickly. > I'm not convinced of that. Consider the example of cscope, a popular > tool for browsing C code. Its parser was written to be "fuzzy", so > it's actually perfectly usable for C++ and Java, even though that > isn't actually supported, IIRC. Now, I'll grant you that that isn't a > perfectly analogous situation, but it is similar in some ways. Yeah. A related question here is whether you want a pretty printer that is entirely unforgiving of (what it thinks are) syntax errors in the input. It might be a lot more useful if it didn't spit up on that, but just did the best it could. regards, tom lane
2012/8/7 Tom Lane <tgl@sss.pgh.pa.us>: > Andrew Dunstan <andrew@dunslane.net> writes: >> On 08/07/2012 02:14 PM, Tom Lane wrote: >>> In short, the only redeeming value of this patch is that it's short. > >> One of the challenges is to have a pretty printer that is kept in sync >> with the dialect that's supported. Anything that doesn't use the >> backend's parser seems to me to be guaranteed to get out of sync very >> quickly. > > Sure. I think if we wanted an actually engineered solution, rather than > a half-baked one, ecpg provides a good source of inspiration. One could > imagine a standalone program that reads a query on stdin and emits a > pretty-printed version to stdout, using a parser that is automatically > generated from the backend's grammar with much the same technology used > in recent ecpg releases. I think that would address most of the > complaints I raised: it would be relatively painless to make use of from > contexts that don't have a live database connection, it wouldn't impose > any constraints related to having suitable database content available, > it wouldn't apply any of the multitude of implementation-dependent > transformations that the backend's parser does, and it could be built > (I think) to do something more with comments than just throw them away. +1 it is better, and it is allow more space for possible styling Pavel > > regards, tom lane
2012/8/7 Peter Geoghegan <peter@2ndquadrant.com>: > On 7 August 2012 20:01, Andrew Dunstan <andrew@dunslane.net> wrote: >> On 08/07/2012 02:14 PM, Tom Lane wrote: >>> * As per some of the complaints already registered in this thread, >>> ruleutils.c is not designed with the goal of being a pretty-printer. >>> Its primary charter is to support pg_dump by regurgitating rules/views >>> in an unambiguous form, which does not necessarily look very similar to >>> what was entered. An example of a transformation that probably nobody >>> would want in a typical pretty-printing context is expansion of >>> "SELECT *" lists. But again, there is really no way to turn that off. >>> Another aspect that seems pretty undesirable for pretty-printing is >>> loss of any comments embedded in the query text. >>> >>> I'm very much not in favor of trying to make ruleutils serve two >>> masters, but that's the game we will be playing if we accept this patch. > > +1. A pretty-printer that makes the query to be cleaned-up actually > undergo parse-analysis seems misconceived to me. This is a tool that > begs to be written in something like Python, as a satellite project, > with much greater flexibility in the format of the SQL that it > outputs. > >> One of the challenges is to have a pretty printer that is kept in sync with >> the dialect that's supported. Anything that doesn't use the backend's parser >> seems to me to be guaranteed to get out of sync very quickly. > > I'm not convinced of that. Consider the example of cscope, a popular > tool for browsing C code. Its parser was written to be "fuzzy", so > it's actually perfectly usable for C++ and Java, even though that > isn't actually supported, IIRC. Now, I'll grant you that that isn't a > perfectly analogous situation, but it is similar in some ways. If we > add a new clause to select and that bit is generically pretty-printed, > is that really so bad? I have a hard time believing that a well > written fuzzy pretty-printer for Postgres wouldn't deliver the vast > majority of the benefits of an in-core approach, at a small fraction > of the effort. You'd also be able to pretty-print plpgsql function > definitions (a particularly compelling case for this kind of tool), > which any approach based on the backends grammar will never be able to > do (except, perhaps, if you were to do something even more > complicated). I disagree - simply pretty printer based on technique that we know from ecpg can be very cheep and it cannot be "fuzzy" because it integrate PostgreSQL parser. Pavel > > -- > Peter Geoghegan http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training and Services