Thread: WIP: pg_pretty_query

WIP: pg_pretty_query

From
Pavel Stehule
Date:
Hello

last year we are spoke about reusing pretty print view code for some queries.

Here is patch:

this patch is really short - it is nice. But - it works only with
known database objects (probably we would it) and it doesn't format
subqueries well


postgres=# select pg_pretty_query('select x.*, z.* from foo, foo x, x
z  where x.a = 10 and x.a = 30 and EXISTS(SELECT * FROM foo WHERE a =
z.a)', true, false);
                     pg_pretty_query
----------------------------------------------------------
  SELECT x.a, z.a                                        +
    FROM foo, foo x, x z                                 +
   WHERE x.a = 10 AND x.a = 30 AND (EXISTS ( SELECT foo.a+
            FROM foo                                     +
           WHERE foo.a = z.a))
(1 row)

Regards

Pavel

Attachment

Re: WIP: pg_pretty_query

From
Thom Brown
Date:
On 7 August 2012 15:14, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hello
>
> last year we are spoke about reusing pretty print view code for some queries.
>
> Here is patch:
>
> this patch is really short - it is nice. But - it works only with
> known database objects (probably we would it) and it doesn't format
> subqueries well
>
>
> postgres=# select pg_pretty_query('select x.*, z.* from foo, foo x, x
> z  where x.a = 10 and x.a = 30 and EXISTS(SELECT * FROM foo WHERE a =
> z.a)', true, false);
>                      pg_pretty_query
> ----------------------------------------------------------
>   SELECT x.a, z.a                                        +
>     FROM foo, foo x, x z                                 +
>    WHERE x.a = 10 AND x.a = 30 AND (EXISTS ( SELECT foo.a+
>             FROM foo                                     +
>            WHERE foo.a = z.a))
> (1 row)

This looks odd:

postgres=# SELECT pg_pretty_query('SELECT 1, (SELECT max(a.x) +
greatest(2,3) FROM generate_series(4,10,2) a(x)) FROM
generate_series(1,100) GROUP BY 1 ORDER BY 1, 2 USING < NULLS FIRST',
true, false);                        pg_pretty_query
------------------------------------------------------------------ SELECT 1,
         +    ( SELECT max(a.x) + GREATEST(2, 3)                          +           FROM generate_series(4, 10, 2)
a(x))                +   FROM generate_series(1, 100) generate_series(generate_series)+  GROUP BY 1::integer
                              +  ORDER BY 1::integer, ( SELECT max(a.x) + GREATEST(2, 3)       +           FROM
generate_series(4,10, 2) a(x)) NULLS FIRST
 
(1 row)

USING < is removed completely (or if I used DESC, NULLS FIRST is then
removed instead), "2" in the order by is expanded to its full query,
and generate_series when used in FROM is repeated with its own name as
a parameter.  I'm also not sure about the spacing before each line.
SELECT, FROM and GROUP BY all appear out of alignment from one
another.

Plus it would be nice if we could support something like the following style:

SELECT   field_one,   field_two + field_three
FROM   my_table
INNER JOIN   another_table   ON       my_table.field_one = another_table.another_field   AND       another_table.valid
=true
 
WHERE   field_one > 3
AND   field_two < 10;

But that's just a nice-to-have.
-- 
Thom


Re: WIP: pg_pretty_query

From
Pavel Stehule
Date:
2012/8/7 Thom Brown <thom@linux.com>:
> On 7 August 2012 15:14, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> Hello
>>
>> last year we are spoke about reusing pretty print view code for some queries.
>>
>> Here is patch:
>>
>> this patch is really short - it is nice. But - it works only with
>> known database objects (probably we would it) and it doesn't format
>> subqueries well
>>
>>
>> postgres=# select pg_pretty_query('select x.*, z.* from foo, foo x, x
>> z  where x.a = 10 and x.a = 30 and EXISTS(SELECT * FROM foo WHERE a =
>> z.a)', true, false);
>>                      pg_pretty_query
>> ----------------------------------------------------------
>>   SELECT x.a, z.a                                        +
>>     FROM foo, foo x, x z                                 +
>>    WHERE x.a = 10 AND x.a = 30 AND (EXISTS ( SELECT foo.a+
>>             FROM foo                                     +
>>            WHERE foo.a = z.a))
>> (1 row)
>
> This looks odd:
>
> postgres=# SELECT pg_pretty_query('SELECT 1, (SELECT max(a.x) +
> greatest(2,3) FROM generate_series(4,10,2) a(x)) FROM
> generate_series(1,100) GROUP BY 1 ORDER BY 1, 2 USING < NULLS FIRST',
> true, false);
>                          pg_pretty_query
> ------------------------------------------------------------------
>   SELECT 1,                                                      +
>      ( SELECT max(a.x) + GREATEST(2, 3)                          +
>             FROM generate_series(4, 10, 2) a(x))                 +
>     FROM generate_series(1, 100) generate_series(generate_series)+
>    GROUP BY 1::integer                                           +
>    ORDER BY 1::integer, ( SELECT max(a.x) + GREATEST(2, 3)       +
>             FROM generate_series(4, 10, 2) a(x)) NULLS FIRST
> (1 row)
>
> USING < is removed completely (or if I used DESC, NULLS FIRST is then
> removed instead), "2" in the order by is expanded to its full query,
> and generate_series when used in FROM is repeated with its own name as
> a parameter.  I'm also not sure about the spacing before each line.
> SELECT, FROM and GROUP BY all appear out of alignment from one
> another.

it is issue - probably we can start deserialization just from parser
stage, not from rewriter stage - but then code will be significantly
longer and we cannot reuse current code for pretty print view.

>
> Plus it would be nice if we could support something like the following style:
>
> SELECT
>     field_one,
>     field_two + field_three
> FROM
>     my_table
> INNER JOIN
>     another_table
>     ON
>         my_table.field_one = another_table.another_field
>     AND
>         another_table.valid = true
> WHERE
>     field_one > 3
> AND
>     field_two < 10;
>

it is second issue - probably there are more lovely styles - CELKO,
your and other. I am not sure if we can support more styles in core
(contrib should be better maybe).

Regards

Pavel

> But that's just a nice-to-have.
> --
> Thom


Re: WIP: pg_pretty_query

From
Bruce Momjian
Date:
On Tue, Aug  7, 2012 at 04:14:34PM +0200, Pavel Stehule wrote:
> Hello
> 
> last year we are spoke about reusing pretty print view code for some queries.
> 
> Here is patch:
> 
> this patch is really short - it is nice. But - it works only with
> known database objects (probably we would it) and it doesn't format
> subqueries well
> 
> 
> postgres=# select pg_pretty_query('select x.*, z.* from foo, foo x, x
> z  where x.a = 10 and x.a = 30 and EXISTS(SELECT * FROM foo WHERE a =
> z.a)', true, false);
>                      pg_pretty_query
> ----------------------------------------------------------
>   SELECT x.a, z.a                                        +
>     FROM foo, foo x, x z                                 +
>    WHERE x.a = 10 AND x.a = 30 AND (EXISTS ( SELECT foo.a+
>             FROM foo                                     +
>            WHERE foo.a = z.a))
> (1 row)

I can see this as very useful for people reporting badly-formatted
queries to our email lists.  Great!

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: WIP: pg_pretty_query

From
Andrew Dunstan
Date:
On 08/07/2012 10:14 AM, Pavel Stehule wrote:
> Hello
>
> last year we are spoke about reusing pretty print view code for some queries.
>
> Here is patch:
>
> this patch is really short - it is nice. But - it works only with
> known database objects (probably we would it) and it doesn't format
> subqueries well
>
>
> postgres=# select pg_pretty_query('select x.*, z.* from foo, foo x, x
> z  where x.a = 10 and x.a = 30 and EXISTS(SELECT * FROM foo WHERE a =
> z.a)', true, false);
>                       pg_pretty_query
> ----------------------------------------------------------
>    SELECT x.a, z.a                                        +
>      FROM foo, foo x, x z                                 +
>     WHERE x.a = 10 AND x.a = 30 AND (EXISTS ( SELECT foo.a+
>              FROM foo                                     +
>             WHERE foo.a = z.a))
> (1 row)

Good stuff. That's one less item on my TODO list :-)

I think we should have a version that lets you specify the wrap column, 
like pg_get_viewdef does. Possibly for this case we should even default 
it to 0 (wrap after each item) instead of 79 which it is for views.


cheers

andrew



Re: WIP: pg_pretty_query

From
David Fetter
Date:
On Tue, Aug 07, 2012 at 04:54:12PM +0200, Pavel Stehule wrote:
> 2012/8/7 Thom Brown <thom@linux.com>:
> > On 7 August 2012 15:14, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> >> Hello
> >>
> >> last year we are spoke about reusing pretty print view code for some queries.
> >>
> >> Here is patch:
> >>
> >> this patch is really short - it is nice. But - it works only with
> >> known database objects (probably we would it) and it doesn't format
> >> subqueries well
> >>
> >>
> >> postgres=# select pg_pretty_query('select x.*, z.* from foo, foo x, x
> >> z  where x.a = 10 and x.a = 30 and EXISTS(SELECT * FROM foo WHERE a =
> >> z.a)', true, false);
> >>                      pg_pretty_query
> >> ----------------------------------------------------------
> >>   SELECT x.a, z.a                                        +
> >>     FROM foo, foo x, x z                                 +
> >>    WHERE x.a = 10 AND x.a = 30 AND (EXISTS ( SELECT foo.a+
> >>             FROM foo                                     +
> >>            WHERE foo.a = z.a))
> >> (1 row)
> >
> > This looks odd:
> >
> > postgres=# SELECT pg_pretty_query('SELECT 1, (SELECT max(a.x) +
> > greatest(2,3) FROM generate_series(4,10,2) a(x)) FROM
> > generate_series(1,100) GROUP BY 1 ORDER BY 1, 2 USING < NULLS FIRST',
> > true, false);
> >                          pg_pretty_query
> > ------------------------------------------------------------------
> >   SELECT 1,                                                      +
> >      ( SELECT max(a.x) + GREATEST(2, 3)                          +
> >             FROM generate_series(4, 10, 2) a(x))                 +
> >     FROM generate_series(1, 100) generate_series(generate_series)+
> >    GROUP BY 1::integer                                           +
> >    ORDER BY 1::integer, ( SELECT max(a.x) + GREATEST(2, 3)       +
> >             FROM generate_series(4, 10, 2) a(x)) NULLS FIRST
> > (1 row)
> >
> > USING < is removed completely (or if I used DESC, NULLS FIRST is then
> > removed instead), "2" in the order by is expanded to its full query,
> > and generate_series when used in FROM is repeated with its own name as
> > a parameter.  I'm also not sure about the spacing before each line.
> > SELECT, FROM and GROUP BY all appear out of alignment from one
> > another.
> 
> it is issue - probably we can start deserialization just from parser
> stage, not from rewriter stage - but then code will be significantly
> longer and we cannot reuse current code for pretty print view.
> 
> >
> > Plus it would be nice if we could support something like the following style:
> >
> > SELECT
> >     field_one,
> >     field_two + field_three
> > FROM
> >     my_table
> > INNER JOIN
> >     another_table
> >     ON
> >         my_table.field_one = another_table.another_field
> >     AND
> >         another_table.valid = true
> > WHERE
> >     field_one > 3
> > AND
> >     field_two < 10;
> >
> 
> it is second issue - probably there are more lovely styles - CELKO,
> your and other. I am not sure if we can support more styles in core
> (contrib should be better maybe).

Would it be better to have output plugins and not privilege one?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: WIP: pg_pretty_query

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> On Tue, Aug  7, 2012 at 04:14:34PM +0200, Pavel Stehule wrote:
>> last year we are spoke about reusing pretty print view code for some queries.
>> 
>> Here is patch:

> I can see this as very useful for people reporting badly-formatted
> queries to our email lists.  Great!

Allow me to express a contrary opinion: I think this is a bad idea.

* First off, packaging it as a SQL function that takes and returns text
seems rather awkward to use.  A lot of places where you might wish for
a SQL pretty-printer aren't going to have a database connection handy
(think "emacs SQL mode").

* The functionality provided is not merely a pretty-printer but sort
of a query validator as well: the function will fail if the query refers
to any tables, columns, functions, etc you don't have in your database.
For some applications that's fine, but others not so much --- in
particular I suspect it's nigh useless for the use-case you mention of
quickly turning an emailed query into something legible.  And there's
no way to separate out the reformatting functionality from that.

* As per some of the complaints already registered in this thread,
ruleutils.c is not designed with the goal of being a pretty-printer.
Its primary charter is to support pg_dump by regurgitating rules/views
in an unambiguous form, which does not necessarily look very similar to
what was entered.  An example of a transformation that probably nobody
would want in a typical pretty-printing context is expansion of
"SELECT *" lists.  But again, there is really no way to turn that off.
Another aspect that seems pretty undesirable for pretty-printing is
loss of any comments embedded in the query text.

I'm very much not in favor of trying to make ruleutils serve two
masters, but that's the game we will be playing if we accept this patch.

In short, the only redeeming value of this patch is that it's short.
The functionality it provides is not something that anyone would come
up with in a green-field design for a pretty-printer, and if we take
it we are going to be faced with a whole lot of redesign requests that
will be painful to implement and will carry heavy risks of breaking
pg_dump and/or EXPLAIN.
        regards, tom lane


Re: WIP: pg_pretty_query

From
Andrew Dunstan
Date:
On 08/07/2012 02:14 PM, Tom Lane wrote:

> * As per some of the complaints already registered in this thread,
> ruleutils.c is not designed with the goal of being a pretty-printer.
> Its primary charter is to support pg_dump by regurgitating rules/views
> in an unambiguous form, which does not necessarily look very similar to
> what was entered.  An example of a transformation that probably nobody
> would want in a typical pretty-printing context is expansion of
> "SELECT *" lists.  But again, there is really no way to turn that off.
> Another aspect that seems pretty undesirable for pretty-printing is
> loss of any comments embedded in the query text.
>
> I'm very much not in favor of trying to make ruleutils serve two
> masters, but that's the game we will be playing if we accept this patch.

I think this horse has probably bolted. If you wanted to segregate off 
this functionality we shouldn't have used things like pg_get_viewdef in 
psql, ISTM.


>
> In short, the only redeeming value of this patch is that it's short.
> The functionality it provides is not something that anyone would come
> up with in a green-field design for a pretty-printer, and if we take
> it we are going to be faced with a whole lot of redesign requests that
> will be painful to implement and will carry heavy risks of breaking
> pg_dump and/or EXPLAIN.
>
>             

One of the challenges is to have a pretty printer that is kept in sync 
with the dialect that's supported. Anything that doesn't use the 
backend's parser seems to me to be guaranteed to get out of sync very 
quickly.

cheers

andrew



Re: WIP: pg_pretty_query

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 08/07/2012 02:14 PM, Tom Lane wrote:
>> In short, the only redeeming value of this patch is that it's short.

> One of the challenges is to have a pretty printer that is kept in sync 
> with the dialect that's supported. Anything that doesn't use the 
> backend's parser seems to me to be guaranteed to get out of sync very 
> quickly.

Sure.  I think if we wanted an actually engineered solution, rather than
a half-baked one, ecpg provides a good source of inspiration.  One could
imagine a standalone program that reads a query on stdin and emits a
pretty-printed version to stdout, using a parser that is automatically
generated from the backend's grammar with much the same technology used
in recent ecpg releases.  I think that would address most of the
complaints I raised: it would be relatively painless to make use of from
contexts that don't have a live database connection, it wouldn't impose
any constraints related to having suitable database content available,
it wouldn't apply any of the multitude of implementation-dependent
transformations that the backend's parser does, and it could be built
(I think) to do something more with comments than just throw them away.
        regards, tom lane


Re: WIP: pg_pretty_query

From
Peter Geoghegan
Date:
On 7 August 2012 20:01, Andrew Dunstan <andrew@dunslane.net> wrote:
> On 08/07/2012 02:14 PM, Tom Lane wrote:
>> * As per some of the complaints already registered in this thread,
>> ruleutils.c is not designed with the goal of being a pretty-printer.
>> Its primary charter is to support pg_dump by regurgitating rules/views
>> in an unambiguous form, which does not necessarily look very similar to
>> what was entered.  An example of a transformation that probably nobody
>> would want in a typical pretty-printing context is expansion of
>> "SELECT *" lists.  But again, there is really no way to turn that off.
>> Another aspect that seems pretty undesirable for pretty-printing is
>> loss of any comments embedded in the query text.
>>
>> I'm very much not in favor of trying to make ruleutils serve two
>> masters, but that's the game we will be playing if we accept this patch.

+1. A pretty-printer that makes the query to be cleaned-up actually
undergo parse-analysis seems misconceived to me. This is a tool that
begs to be written in something like Python, as a satellite project,
with much greater flexibility in the format of the SQL that it
outputs.

> One of the challenges is to have a pretty printer that is kept in sync with
> the dialect that's supported. Anything that doesn't use the backend's parser
> seems to me to be guaranteed to get out of sync very quickly.

I'm not convinced of that. Consider the example of cscope, a popular
tool for browsing C code. Its parser was written to be "fuzzy", so
it's actually perfectly usable for C++ and Java, even though that
isn't actually supported, IIRC. Now, I'll grant you that that isn't a
perfectly analogous situation, but it is similar in some ways. If we
add a new clause to select and that bit is generically pretty-printed,
is that really so bad? I have a hard time believing that a well
written fuzzy pretty-printer for Postgres wouldn't deliver the vast
majority of the benefits of an in-core approach, at a small fraction
of the effort. You'd also be able to pretty-print plpgsql function
definitions (a particularly compelling case for this kind of tool),
which any approach based on the backends grammar will never be able to
do (except, perhaps, if you were to do something even more
complicated).

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


Re: WIP: pg_pretty_query

From
Tom Lane
Date:
Peter Geoghegan <peter@2ndquadrant.com> writes:
> On 7 August 2012 20:01, Andrew Dunstan <andrew@dunslane.net> wrote:
>> One of the challenges is to have a pretty printer that is kept in sync with
>> the dialect that's supported. Anything that doesn't use the backend's parser
>> seems to me to be guaranteed to get out of sync very quickly.

> I'm not convinced of that. Consider the example of cscope, a popular
> tool for browsing C code. Its parser was written to be "fuzzy", so
> it's actually perfectly usable for C++ and Java, even though that
> isn't actually supported, IIRC. Now, I'll grant you that that isn't a
> perfectly analogous situation, but it is similar in some ways.

Yeah.  A related question here is whether you want a pretty printer that
is entirely unforgiving of (what it thinks are) syntax errors in the
input.  It might be a lot more useful if it didn't spit up on that, but
just did the best it could.
        regards, tom lane


Re: WIP: pg_pretty_query

From
Pavel Stehule
Date:
2012/8/7 Tom Lane <tgl@sss.pgh.pa.us>:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 08/07/2012 02:14 PM, Tom Lane wrote:
>>> In short, the only redeeming value of this patch is that it's short.
>
>> One of the challenges is to have a pretty printer that is kept in sync
>> with the dialect that's supported. Anything that doesn't use the
>> backend's parser seems to me to be guaranteed to get out of sync very
>> quickly.
>
> Sure.  I think if we wanted an actually engineered solution, rather than
> a half-baked one, ecpg provides a good source of inspiration.  One could
> imagine a standalone program that reads a query on stdin and emits a
> pretty-printed version to stdout, using a parser that is automatically
> generated from the backend's grammar with much the same technology used
> in recent ecpg releases.  I think that would address most of the
> complaints I raised: it would be relatively painless to make use of from
> contexts that don't have a live database connection, it wouldn't impose
> any constraints related to having suitable database content available,
> it wouldn't apply any of the multitude of implementation-dependent
> transformations that the backend's parser does, and it could be built
> (I think) to do something more with comments than just throw them away.

+1

it is better, and it is allow more space for possible styling

Pavel


>
>                         regards, tom lane


Re: WIP: pg_pretty_query

From
Pavel Stehule
Date:
2012/8/7 Peter Geoghegan <peter@2ndquadrant.com>:
> On 7 August 2012 20:01, Andrew Dunstan <andrew@dunslane.net> wrote:
>> On 08/07/2012 02:14 PM, Tom Lane wrote:
>>> * As per some of the complaints already registered in this thread,
>>> ruleutils.c is not designed with the goal of being a pretty-printer.
>>> Its primary charter is to support pg_dump by regurgitating rules/views
>>> in an unambiguous form, which does not necessarily look very similar to
>>> what was entered.  An example of a transformation that probably nobody
>>> would want in a typical pretty-printing context is expansion of
>>> "SELECT *" lists.  But again, there is really no way to turn that off.
>>> Another aspect that seems pretty undesirable for pretty-printing is
>>> loss of any comments embedded in the query text.
>>>
>>> I'm very much not in favor of trying to make ruleutils serve two
>>> masters, but that's the game we will be playing if we accept this patch.
>
> +1. A pretty-printer that makes the query to be cleaned-up actually
> undergo parse-analysis seems misconceived to me. This is a tool that
> begs to be written in something like Python, as a satellite project,
> with much greater flexibility in the format of the SQL that it
> outputs.
>
>> One of the challenges is to have a pretty printer that is kept in sync with
>> the dialect that's supported. Anything that doesn't use the backend's parser
>> seems to me to be guaranteed to get out of sync very quickly.
>
> I'm not convinced of that. Consider the example of cscope, a popular
> tool for browsing C code. Its parser was written to be "fuzzy", so
> it's actually perfectly usable for C++ and Java, even though that
> isn't actually supported, IIRC. Now, I'll grant you that that isn't a
> perfectly analogous situation, but it is similar in some ways. If we
> add a new clause to select and that bit is generically pretty-printed,
> is that really so bad? I have a hard time believing that a well
> written fuzzy pretty-printer for Postgres wouldn't deliver the vast
> majority of the benefits of an in-core approach, at a small fraction
> of the effort. You'd also be able to pretty-print plpgsql function
> definitions (a particularly compelling case for this kind of tool),
> which any approach based on the backends grammar will never be able to
> do (except, perhaps, if you were to do something even more
> complicated).

I disagree - simply pretty printer based on technique that we know
from ecpg can be very cheep and it cannot be "fuzzy" because it
integrate PostgreSQL parser.

Pavel

>
> --
> Peter Geoghegan       http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training and Services