Thread: [PATCH] Add --syntax to postgres for SQL syntax checking

[PATCH] Add --syntax to postgres for SQL syntax checking

From
Josef Šimánek
Date:
Hello!

Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
of ideas for PostgreSQL
(https://riggs.business/blog/f/postgresql-todo-2023), I have crafted a
quick patch to do SQL syntax validation.

It is also heavily inspired by the "ruby -c" command, useful to check
syntax of Ruby programs without executing them.

For now, to keep it simple and to open discussion, I have added new
"--syntax" option into "postgres" command, since it is currently the
only one using needed parser dependency (at least per my
understanding). I tried to add this into psql or separate pg_syntax
commands, but parser is not exposed in "postgres_fe.h" and including
backend into those tools would not make most likely sense. Also syntax
could vary per backend, it makes sense to keep it in there.

It expects input on STDIN, prints out error if any and prints out
summary message (Valid SQL/Invalid SQL). On valid input it exits with
0 (success), otherwise it exits with 1 (error).

Example usage:

$ echo "SELECT 1" | src/backend/postgres --syntax
Valid SQL

$ echo "SELECT 1abc" | src/backend/postgres --syntax
ERROR:  trailing junk after numeric literal at or near "1a" at character 8
Invalid SQL

$ cat ../src/test/regress/sql/alter_operator.sql | src/backend/postgres --syntax
Valid SQL

$ cat ../src/test/regress/sql/advisory_lock.sql | src/backend/postgres --syntax
ERROR:  syntax error at or near "\" at character 99
Invalid SQL

This could be useful for manual script checks, automated script checks
and code editor integrations.

Notice it just parses the SQL, it doesn't detect any "runtime"
problems like unexisting table names, etc.

I have various ideas around this (like exposing similar functionality
directly in SQL using custom function like pg_check_syntax), but I
would like to get some feedback first.

What do you think?
enhnace
PS: I wasn't able to find any automated tests for "postgres" command
to enhance with, are there any?

PS2: Patch could be found at https://github.com/simi/postgres/pull/8 as well.

Attachment

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
Laurenz Albe
Date:
On Fri, 2023-12-15 at 13:21 +0100, Josef Šimánek wrote:
> Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
> of ideas for PostgreSQL
> (https://riggs.business/blog/f/postgresql-todo-2023), I have crafted a
> quick patch to do SQL syntax validation.
>
> What do you think?

I like the idea.  But what will happen if the SQL statement references
tables or other objects, since we have no database?

Yours,
Laurenz Albe



Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
Josef Šimánek
Date:


Dne pá 15. 12. 2023 14:09 uživatel Laurenz Albe <laurenz.albe@cybertec.at> napsal:
On Fri, 2023-12-15 at 13:21 +0100, Josef Šimánek wrote:
> Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
> of ideas for PostgreSQL
> (https://riggs.business/blog/f/postgresql-todo-2023), I have crafted a
> quick patch to do SQL syntax validation.
>
> What do you think?

I like the idea.  But what will happen if the SQL statement references
tables or other objects, since we have no database?

It checks just the identifier is valid from parser perspective, like it is valid table name.


Yours,
Laurenz Albe

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
Pavel Stehule
Date:


Dne pá 15. 12. 2023 14:14 uživatel Josef Šimánek <josef.simanek@gmail.com> napsal:


Dne pá 15. 12. 2023 14:09 uživatel Laurenz Albe <laurenz.albe@cybertec.at> napsal:
On Fri, 2023-12-15 at 13:21 +0100, Josef Šimánek wrote:
> Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
> of ideas for PostgreSQL
> (https://riggs.business/blog/f/postgresql-todo-2023), I have crafted a
> quick patch to do SQL syntax validation.
>
> What do you think?

I like the idea.  But what will happen if the SQL statement references
tables or other objects, since we have no database?

It checks just the identifier is valid from parser perspective, like it is valid table name.

There can by two levels, like plpgsql or like pllgsql_check

Regards

Pavel


Yours,
Laurenz Albe

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
Tom Lane
Date:
Laurenz Albe <laurenz.albe@cybertec.at> writes:
> On Fri, 2023-12-15 at 13:21 +0100, Josef Šimánek wrote:
>> Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
>> of ideas for PostgreSQL
>> (https://riggs.business/blog/f/postgresql-todo-2023), I have crafted a
>> quick patch to do SQL syntax validation.
>> 
>> What do you think?

> I like the idea.  But what will happen if the SQL statement references
> tables or other objects, since we have no database?

This seems like a fairly useless wart to me.  What does it do that
you can't do better with existing facilities (psql etc)?

In the big picture a command line switch in the postgres executable
doesn't feel like the right place for this.  There's no good reason
to assume that the server executable will be installed where you want
this capability; not to mention the possibility of version skew
between that executable and whatever installation you're actually
running on.

Another thing I don't like is that this exposes to the user what ought
to be purely an implementation detail, namely the division of labor
between gram.y (raw_parser()) and the rest of the parser.  There are
checks that a user would probably see as "syntax checks" that don't
happen in gram.y, and conversely there are some things we reject there
that seem more like semantic than syntax issues.

            regards, tom lane



Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
Josef Šimánek
Date:
pá 15. 12. 2023 v 15:50 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>
> Laurenz Albe <laurenz.albe@cybertec.at> writes:
> > On Fri, 2023-12-15 at 13:21 +0100, Josef Šimánek wrote:
> >> Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
> >> of ideas for PostgreSQL
> >> (https://riggs.business/blog/f/postgresql-todo-2023), I have crafted a
> >> quick patch to do SQL syntax validation.
> >>
> >> What do you think?
>
> > I like the idea.  But what will happen if the SQL statement references
> > tables or other objects, since we have no database?
>
> This seems like a fairly useless wart to me.  What does it do that
> you can't do better with existing facilities (psql etc)?

Per my experience, this is mostly useful during development to catch
syntax errors super early. For SELECT/INSERT/UPDATE/DELETE queries, it
is usually enough to prepend with EXPLAIN and run. But EXPLAIN doesn't
support all SQL like DDL statements. Let's say I have a long SQL
script I'm working on and there is typo in the middle like ALTERR
instead of ALTER. Is there any simple way to detect this without
actually running the statement in psql or other existing facilities?
This check could be simply combined with editor capabilities to be run
on each SQL file save to get quick feedback on this kind of mistakes
for great developer experience.

> In the big picture a command line switch in the postgres executable
> doesn't feel like the right place for this.  There's no good reason
> to assume that the server executable will be installed where you want
> this capability; not to mention the possibility of version skew
> between that executable and whatever installation you're actually
> running on.

This is mostly intended for SQL developers and CI systems where
PostgreSQL backend is usually installed and in the actual version
needed. I agree postgres is not the best place (even it makes
partially sense to me), but as I mentioned, I wasn't able to craft a
quick patch with a better place to put this in. What would you
recommend? Separate executable like pg_syntaxcheck?

> Another thing I don't like is that this exposes to the user what ought
> to be purely an implementation detail, namely the division of labor
> between gram.y (raw_parser()) and the rest of the parser.  There are
> checks that a user would probably see as "syntax checks" that don't
> happen in gram.y, and conversely there are some things we reject there
> that seem more like semantic than syntax issues.

I have no big insight into SQL parsing. Can you please share examples
of given concerns? Is there anything better than raw_parser() for this
purpose?

>                         regards, tom lane



Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
"David G. Johnston"
Date:
On Fri, Dec 15, 2023 at 8:05 AM Josef Šimánek <josef.simanek@gmail.com> wrote:
pá 15. 12. 2023 v 15:50 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>
> Laurenz Albe <laurenz.albe@cybertec.at> writes:
> > On Fri, 2023-12-15 at 13:21 +0100, Josef Šimánek wrote:
> >> Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
> >> of ideas for PostgreSQL
> >> (https://riggs.business/blog/f/postgresql-todo-2023), I have crafted a
> >> quick patch to do SQL syntax validation.
> >>
> >> What do you think?
>
> > I like the idea.  But what will happen if the SQL statement references
> > tables or other objects, since we have no database?
>
> This seems like a fairly useless wart to me.  What does it do that
> you can't do better with existing facilities (psql etc)?

Per my experience, this is mostly useful during development to catch
syntax errors super early.

I'd suggest helping this project get production capable since it already is trying to integrate with the development ecosystem you describe here.


I agree that developing this as a new executable for the purpose is needed in order to best conform to existing conventions.

David J.

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
Josef Šimánek
Date:
pá 15. 12. 2023 v 16:16 odesílatel David G. Johnston
<david.g.johnston@gmail.com> napsal:
>
> On Fri, Dec 15, 2023 at 8:05 AM Josef Šimánek <josef.simanek@gmail.com> wrote:
>>
>> pá 15. 12. 2023 v 15:50 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>> >
>> > Laurenz Albe <laurenz.albe@cybertec.at> writes:
>> > > On Fri, 2023-12-15 at 13:21 +0100, Josef Šimánek wrote:
>> > >> Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
>> > >> of ideas for PostgreSQL
>> > >> (https://riggs.business/blog/f/postgresql-todo-2023), I have crafted a
>> > >> quick patch to do SQL syntax validation.
>> > >>
>> > >> What do you think?
>> >
>> > > I like the idea.  But what will happen if the SQL statement references
>> > > tables or other objects, since we have no database?
>> >
>> > This seems like a fairly useless wart to me.  What does it do that
>> > you can't do better with existing facilities (psql etc)?
>>
>> Per my experience, this is mostly useful during development to catch
>> syntax errors super early.
>
>
> I'd suggest helping this project get production capable since it already is trying to integrate with the development
ecosystemyou describe here. 
>
> https://github.com/supabase/postgres_lsp

Indeed LSP is the one of the targets of this feature. Currently it is
using https://github.com/pganalyze/libpg_query under the hood probably
because of the same reasons I have described (parser is not available
in public APIs of postgres_fe.h or libpq). Exposing basic parser
capabilities in postgres binary itself and also on SQL level by
pg_check_syntax function can prevent the need of "hacking" pg parser
to be accessible outside of server binary.

> I agree that developing this as a new executable for the purpose is needed in order to best conform to existing
conventions.
>
> David J.
>



Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
"David G. Johnston"
Date:
On Fri, Dec 15, 2023 at 8:20 AM Josef Šimánek <josef.simanek@gmail.com> wrote:
(parser is not available
in public APIs of postgres_fe.h or libpq).

What about building "libpg" that does expose and exports some public APIs for the parser?  We can include a reference CLI implementation for basic usage of the functionality while leaving the actual language server project outside of core.

David J.

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
Josef Šimánek
Date:
pá 15. 12. 2023 v 16:32 odesílatel David G. Johnston
<david.g.johnston@gmail.com> napsal:
>
> On Fri, Dec 15, 2023 at 8:20 AM Josef Šimánek <josef.simanek@gmail.com> wrote:
>>
>> (parser is not available
>> in public APIs of postgres_fe.h or libpq).
>
>
> What about building "libpg" that does expose and exports some public APIs for the parser?  We can include a reference
CLIimplementation for basic usage of the functionality while leaving the actual language server project outside of
core.

Language server (LSP) can just benefit from this feature, but it
doesn't cover all possibilities since LSP is meant for one purpose ->
run in developer's code editor. Actual syntax check is more generic,
able to cover CI checks and more. I would not couple this feature and
LSP, LSP can just benefit from it (and it is usually built in a way
that uses other tools and packs them into LSP). Exposing this kind of
SQL check doesn't mean something LSP related being implemented in
core. LSP can just benefit from this.

Exposing parser to libpg seems good idea, but I'm not sure how simple
that could  be done since during my journey I have found out there are
a lot of dependencies which are not present in usual frontend code per
my understanding like memory contexts and removal of those
(decoupling) would be huge project IMHO.

> David J.



Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
Tomas Vondra
Date:

On 12/15/23 16:38, Josef Šimánek wrote:
> pá 15. 12. 2023 v 16:32 odesílatel David G. Johnston
> <david.g.johnston@gmail.com> napsal:
>>
>> On Fri, Dec 15, 2023 at 8:20 AM Josef Šimánek <josef.simanek@gmail.com> wrote:
>>>
>>> (parser is not available
>>> in public APIs of postgres_fe.h or libpq).
>>
>>
>> What about building "libpg" that does expose and exports some public APIs for the parser?  We can include a
referenceCLI implementation for basic usage of the functionality while leaving the actual language server project
outsideof core.
 
> 
> Language server (LSP) can just benefit from this feature, but it
> doesn't cover all possibilities since LSP is meant for one purpose ->
> run in developer's code editor. Actual syntax check is more generic,
> able to cover CI checks and more. I would not couple this feature and
> LSP, LSP can just benefit from it (and it is usually built in a way
> that uses other tools and packs them into LSP). Exposing this kind of
> SQL check doesn't mean something LSP related being implemented in
> core. LSP can just benefit from this.
> 

I don't know enough about LSP to have a good idea how to implement this
for PG, but my assumption would be we'd have some sort of library
exposing "parser" to frontend tools, and also an in-core binary using
that library (say, src/bin/pg_syntax_check). And LSP would use that
parser library too ...

I think there's about 0% chance we'd add this to "postgres" binary.

> Exposing parser to libpg seems good idea, but I'm not sure how simple
> that could  be done since during my journey I have found out there are
> a lot of dependencies which are not present in usual frontend code per
> my understanding like memory contexts and removal of those
> (decoupling) would be huge project IMHO.
> 

You're right the grammar/parser expects a lot of backend infrastructure,
so making it available to frontend is going to be challenging. But I
doubt there's a better way than starting with gram.y and either removing
or adding the missing pieces (maybe only a mock version of it).

I'm not a bison expert, but considering your goal seems to be a basic
syntax check, maybe you could simply rip out most of the bits depending
on backend stuff, or maybe replace them with some trivial no-op code?

But as Tom mentioned, the question is how far gram.y gets you. There's
plenty of ereport(ERROR) calls in src/backend/parser/*.c our users might
easily consider as parse errors ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
Josef Šimánek
Date:
ne 25. 2. 2024 v 23:24 odesílatel Tomas Vondra
<tomas.vondra@enterprisedb.com> napsal:
>
>
>
> On 12/15/23 16:38, Josef Šimánek wrote:
> > pá 15. 12. 2023 v 16:32 odesílatel David G. Johnston
> > <david.g.johnston@gmail.com> napsal:
> >>
> >> On Fri, Dec 15, 2023 at 8:20 AM Josef Šimánek <josef.simanek@gmail.com> wrote:
> >>>
> >>> (parser is not available
> >>> in public APIs of postgres_fe.h or libpq).
> >>
> >>
> >> What about building "libpg" that does expose and exports some public APIs for the parser?  We can include a
referenceCLI implementation for basic usage of the functionality while leaving the actual language server project
outsideof core. 
> >
> > Language server (LSP) can just benefit from this feature, but it
> > doesn't cover all possibilities since LSP is meant for one purpose ->
> > run in developer's code editor. Actual syntax check is more generic,
> > able to cover CI checks and more. I would not couple this feature and
> > LSP, LSP can just benefit from it (and it is usually built in a way
> > that uses other tools and packs them into LSP). Exposing this kind of
> > SQL check doesn't mean something LSP related being implemented in
> > core. LSP can just benefit from this.
> >
>
> I don't know enough about LSP to have a good idea how to implement this
> for PG, but my assumption would be we'd have some sort of library
> exposing "parser" to frontend tools, and also an in-core binary using
> that library (say, src/bin/pg_syntax_check). And LSP would use that
> parser library too ...
>
> I think there's about 0% chance we'd add this to "postgres" binary.

Exposing parser to frontend tools makes no sense to me and even if it
would, it is a huge project not worth to be done just because of
syntax check. I can update the patch to prepare a new binary, but
still on the backend side. This syntax check should be equivalent to
running a server locally, running a query and caring only about
parsing part finished successfully. In my thinking, this belongs to
backend tools.

> > Exposing parser to libpg seems good idea, but I'm not sure how simple
> > that could  be done since during my journey I have found out there are
> > a lot of dependencies which are not present in usual frontend code per
> > my understanding like memory contexts and removal of those
> > (decoupling) would be huge project IMHO.
> >
>
> You're right the grammar/parser expects a lot of backend infrastructure,
> so making it available to frontend is going to be challenging. But I
> doubt there's a better way than starting with gram.y and either removing
> or adding the missing pieces (maybe only a mock version of it).
>
> I'm not a bison expert, but considering your goal seems to be a basic
> syntax check, maybe you could simply rip out most of the bits depending
> on backend stuff, or maybe replace them with some trivial no-op code?
>
> But as Tom mentioned, the question is how far gram.y gets you. There's
> plenty of ereport(ERROR) calls in src/backend/parser/*.c our users might
> easily consider as parse errors ...
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company



Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
Jelte Fennema-Nio
Date:
On Sun, 25 Feb 2024 at 23:34, Josef Šimánek <josef.simanek@gmail.com> wrote:
> Exposing parser to frontend tools makes no sense to me

Not everyone seems to agree with that, it's actually already done by
Lukas from pganalyze: https://github.com/pganalyze/libpg_query



Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
Josef Šimánek
Date:
po 26. 2. 2024 v 8:20 odesílatel Jelte Fennema-Nio <postgres@jeltef.nl> napsal:
>
> On Sun, 25 Feb 2024 at 23:34, Josef Šimánek <josef.simanek@gmail.com> wrote:
> > Exposing parser to frontend tools makes no sense to me
>
> Not everyone seems to agree with that, it's actually already done by
> Lukas from pganalyze: https://github.com/pganalyze/libpg_query

I did a quick look. That's clearly amazing work, but it is not parser
being exposed to frontend (refactored). It is (per my understanding)
backend code isolated to minimum to be able to parse query. It is
still bound to individual backend version and to backend source code.
And it is close to my effort (I was about to start with a simpler
version not providing tokens, just the result), but instead of copying
files from backend into separate project and shave off to minimum,
provide the same tool with backend utils directly.



Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
Robert Haas
Date:
On Sun, Feb 25, 2024 at 5:24 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> I think there's about 0% chance we'd add this to "postgres" binary.

Several people have taken this position, so I'm going to mark
https://commitfest.postgresql.org/48/4704/ as Rejected.

My own opinion is that syntax checking is a useful thing to expose,
but I don't believe that this is a useful way to expose it. I think
this comment that Tom made upthread is right on target:

# Another thing I don't like is that this exposes to the user what ought
# to be purely an implementation detail, namely the division of labor
# between gram.y (raw_parser()) and the rest of the parser.  There are
# checks that a user would probably see as "syntax checks" that don't
# happen in gram.y, and conversely there are some things we reject there
# that seem more like semantic than syntax issues.

I think that what that means in practice is that, while this patch may
seem to give reasonable results in simple tests, as soon as you try to
do slightly more complicated things with it, it's going to give weird
results, either failing to flag things that you'd expect it to flag,
or flagging things you'd expect it not to flag. Fixing that would be
either impossible or a huge amount of work depending on your point of
view. If you take the point of view that we need to adjust things so
that the raw parser reports all the things that ought to be reported
by a tool like this and none of the things that it shouldn't, then
it's probably just a huge amount of work. If you take the point of
view that what goes into the raw parser and what goes into parse
analysis ought to be an implementation decision untethered to what a
tool like this ought to report, then fixing the problems would be
impossible, or at least, it would amount to throwing away this patch
and starting over. I think the latter point of view, which Tom has
already taken, would be the more prevalent view among hackers by far,
but even if the former view prevailed, who is going to do all that
work?

I strongly suspect that doing something useful in this area requires
about two orders of magnitude more code than are included in this
patch, and a completely different design. If it actually worked well
to do something this simple, somebody probably would have done it
already. In fact, there are already tools out there for validation,
like https://github.com/okbob/plpgsql_check for example. That tool
doesn't do exactly the same thing as this patch is trying to do, but
it does do other kinds of validation, and it's 19k lines of code, vs.
the 45 lines of code in this patch, which I think reinforces the point
that you need to do something much more complicated than this to
create real value.

Also, the patch as proposed, besides being 45 lines, also has zero
lines of comments, no test cases, no documentation, and doesn't follow
the PostgreSQL coding standards. I'm not saying that to be mean, nor
am I suggesting that Josef should go fix that stuff. It's perfectly
reasonable to propose a small patch without a lot of research to see
what people think -- but now we have the answer to that question:
people think it won't work. So Josef should now decide to either give
up, or try a new approach, or if he's really sure that all the
feedback that has been given so far is completely wrong, he can try to
demonstrate that the patch does all kinds of wonderful things with
very few disadvantages. But I think if he does that last, he's going
to find that it's not really possible, because I'm pretty sure that
Tom is right.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
Josef Šimánek
Date:
st 15. 5. 2024 v 19:43 odesílatel Robert Haas <robertmhaas@gmail.com> napsal:
>
> On Sun, Feb 25, 2024 at 5:24 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
> > I think there's about 0% chance we'd add this to "postgres" binary.
>
> Several people have taken this position, so I'm going to mark
> https://commitfest.postgresql.org/48/4704/ as Rejected.
>
> My own opinion is that syntax checking is a useful thing to expose,
> but I don't believe that this is a useful way to expose it. I think
> this comment that Tom made upthread is right on target:
>
> # Another thing I don't like is that this exposes to the user what ought
> # to be purely an implementation detail, namely the division of labor
> # between gram.y (raw_parser()) and the rest of the parser.  There are
> # checks that a user would probably see as "syntax checks" that don't
> # happen in gram.y, and conversely there are some things we reject there
> # that seem more like semantic than syntax issues.
>
> I think that what that means in practice is that, while this patch may
> seem to give reasonable results in simple tests, as soon as you try to
> do slightly more complicated things with it, it's going to give weird
> results, either failing to flag things that you'd expect it to flag,
> or flagging things you'd expect it not to flag. Fixing that would be
> either impossible or a huge amount of work depending on your point of
> view. If you take the point of view that we need to adjust things so
> that the raw parser reports all the things that ought to be reported
> by a tool like this and none of the things that it shouldn't, then
> it's probably just a huge amount of work. If you take the point of
> view that what goes into the raw parser and what goes into parse
> analysis ought to be an implementation decision untethered to what a
> tool like this ought to report, then fixing the problems would be
> impossible, or at least, it would amount to throwing away this patch
> and starting over. I think the latter point of view, which Tom has
> already taken, would be the more prevalent view among hackers by far,
> but even if the former view prevailed, who is going to do all that
> work?
>
> I strongly suspect that doing something useful in this area requires
> about two orders of magnitude more code than are included in this
> patch, and a completely different design. If it actually worked well
> to do something this simple, somebody probably would have done it
> already. In fact, there are already tools out there for validation,
> like https://github.com/okbob/plpgsql_check for example. That tool
> doesn't do exactly the same thing as this patch is trying to do, but
> it does do other kinds of validation, and it's 19k lines of code, vs.
> the 45 lines of code in this patch, which I think reinforces the point
> that you need to do something much more complicated than this to
> create real value.
>
> Also, the patch as proposed, besides being 45 lines, also has zero
> lines of comments, no test cases, no documentation, and doesn't follow
> the PostgreSQL coding standards. I'm not saying that to be mean, nor
> am I suggesting that Josef should go fix that stuff. It's perfectly
> reasonable to propose a small patch without a lot of research to see
> what people think -- but now we have the answer to that question:
> people think it won't work. So Josef should now decide to either give
> up, or try a new approach, or if he's really sure that all the
> feedback that has been given so far is completely wrong, he can try to
> demonstrate that the patch does all kinds of wonderful things with
> very few disadvantages. But I think if he does that last, he's going
> to find that it's not really possible, because I'm pretty sure that
> Tom is right.

I'm totally OK to mark this as rejected and indeed 45 lines were just
minimal patch to create PoC (I have coded this during last PGConf.eu
lunch break) and mainly to start discussion.

I'm not sure everyone in this thread understands the reason for this
patch, which is clearly my fault, since I have failed to explain. Main
idea is to make a tool to validate query can be parsed, that's all.
Similar to running "EXPLAIN query", but not caring about the result
and not caring about the DB structure (ignoring missing tables, ...),
just checking it was successfully executed. This definitely belongs to
the server side and not to the client side, it is just a tool to
validate that for this running PostgreSQL backend it is a "parseable"
query.

I'm not giving up on this, but I hear there are various problems to
explore. If I understand it well, just running the parser to query
doesn't guarantee the query is valid, since it can fail later for
various reasons (I mean other than missing table, ...). I wasn't aware
of that. Also exposing this inside postgres binary seems
controversial. I had an idea to expose parser result at SQL level with
a new command (similar to EXPLAIN), but you'll need running PostgreSQL
backend to be able to use this capability, which is against one of the
original ideas. On the otherside PostgreSQL exposes a lot of "meta"
functionality already and this could be a nice addition.

I'll revisit the discussion again and try to submit another try once
I'll get more context and experience.

Thanks everyone for constructive comments!

> --
> Robert Haas
> EDB: http://www.enterprisedb.com



Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
Josef Šimánek
Date:
pá 15. 12. 2023 v 15:50 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>
> Laurenz Albe <laurenz.albe@cybertec.at> writes:
> > On Fri, 2023-12-15 at 13:21 +0100, Josef Šimánek wrote:
> >> Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
> >> of ideas for PostgreSQL
> >> (https://riggs.business/blog/f/postgresql-todo-2023), I have crafted a
> >> quick patch to do SQL syntax validation.
> >>
> >> What do you think?
>
> > I like the idea.  But what will happen if the SQL statement references
> > tables or other objects, since we have no database?
>
> This seems like a fairly useless wart to me.

this hurts :'(

>
> In the big picture a command line switch in the postgres executable
> doesn't feel like the right place for this.  There's no good reason
> to assume that the server executable will be installed where you want
> this capability; not to mention the possibility of version skew
> between that executable and whatever installation you're actually
> running on.
>
> Another thing I don't like is that this exposes to the user what ought
> to be purely an implementation detail, namely the division of labor
> between gram.y (raw_parser()) and the rest of the parser.  There are
> checks that a user would probably see as "syntax checks" that don't
> happen in gram.y, and conversely there are some things we reject there
> that seem more like semantic than syntax issues.
>
>                         regards, tom lane



=?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?= <josef.simanek@gmail.com> writes:
> I'm not sure everyone in this thread understands the reason for this
> patch, which is clearly my fault, since I have failed to explain. Main
> idea is to make a tool to validate query can be parsed, that's all.
> Similar to running "EXPLAIN query", but not caring about the result
> and not caring about the DB structure (ignoring missing tables, ...),
> just checking it was successfully executed. This definitely belongs to
> the server side and not to the client side, it is just a tool to
> validate that for this running PostgreSQL backend it is a "parseable"
> query.

The thing that was bothering me most about this is that I don't
understand why that's a useful check.  If I meant to type

    UPDATE mytab SET mycol = 42;

and instead I type

    UPDATEE mytab SET mycol = 42;

your proposed feature would catch that; great.  But if I type

    UPDATE mytabb SET mycol = 42;

it won't.  How does that make sense?  I'm not entirely sure where
to draw the line about what a "syntax check" should catch, but this
seems a bit south of what I'd want in a syntax-checking editor.

BTW, if you do feel that a pure grammar check is worthwhile, you
should look at the ecpg preprocessor, which does more or less that
with the SQL portions of its input.  ecpg could be a better model
to follow because it doesn't bring all the dependencies of the server
and so is much more likely to appear in a client-side installation.
It's kind of an ugly, unmaintained mess at the moment, sadly.

The big knock on doing this client-side is that there might be
version skew compared to the server you're using --- but if you
are not doing anything beyond a grammar-level check then your
results are pretty approximate anyway, ISTM.  We've not heard
anything suggesting that version skew is a huge problem for
ecpg users.

            regards, tom lane



Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
walther@technowledgy.de
Date:
Tom Lane:
> The thing that was bothering me most about this is that I don't
> understand why that's a useful check.  If I meant to type
> 
>     UPDATE mytab SET mycol = 42;
> 
> and instead I type
> 
>     UPDATEE mytab SET mycol = 42;
> 
> your proposed feature would catch that; great.  But if I type
> 
>     UPDATE mytabb SET mycol = 42;
> 
> it won't.  How does that make sense?  I'm not entirely sure where
> to draw the line about what a "syntax check" should catch, but this
> seems a bit south of what I'd want in a syntax-checking editor.
> 
> BTW, if you do feel that a pure grammar check is worthwhile, you
> should look at the ecpg preprocessor, which does more or less that
> with the SQL portions of its input.  ecpg could be a better model
> to follow because it doesn't bring all the dependencies of the server
> and so is much more likely to appear in a client-side installation.
> It's kind of an ugly, unmaintained mess at the moment, sadly.

Would working with ecpg allow to get back a parse tree of the query to 
do stuff with that?

This is really what is missing for the ecosystem. A libpqparser for 
tools to use: Formatters, linters, query rewriters, simple syntax 
checkers... they are all missing access to postgres' own parser.

Best,

Wolfgang




Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
Josef Šimánek
Date:
st 15. 5. 2024 v 20:39 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>
> =?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?= <josef.simanek@gmail.com> writes:
> > I'm not sure everyone in this thread understands the reason for this
> > patch, which is clearly my fault, since I have failed to explain. Main
> > idea is to make a tool to validate query can be parsed, that's all.
> > Similar to running "EXPLAIN query", but not caring about the result
> > and not caring about the DB structure (ignoring missing tables, ...),
> > just checking it was successfully executed. This definitely belongs to
> > the server side and not to the client side, it is just a tool to
> > validate that for this running PostgreSQL backend it is a "parseable"
> > query.
>
> The thing that was bothering me most about this is that I don't
> understand why that's a useful check.  If I meant to type
>
>         UPDATE mytab SET mycol = 42;
>
> and instead I type
>
>         UPDATEE mytab SET mycol = 42;
>
> your proposed feature would catch that; great.  But if I type
>
>         UPDATE mytabb SET mycol = 42;
>
> it won't.  How does that make sense?  I'm not entirely sure where
> to draw the line about what a "syntax check" should catch, but this
> seems a bit south of what I'd want in a syntax-checking editor.

This is exactly where the line is drawn. My motivation is not to use
this feature for syntax check in editor (even could be used to find
those banalities). I'm looking to improve automation to be able to
detect those banalities as early as possible. Let's say there is
complex CI automation configuring and starting PostgreSQL backend,
loading some data, ... and in the end all this is useless, since there
is this kind of simple mistake like UPDATEE. I would like to detect
this problem as early as possible and stop the complex CI pipeline to
save time and also to save resources (= money) by failing super-early
and reporting back. This kind of mistake could be simply introduced by
like wrongly resolved git conflict, human typing error ...

This kind of mistake (typo, ...) can be usually spotted super early.
In compiled languages during compilation, in interpreted languages
(like Ruby) at program start (since code is parsed as one of the first
steps). There is no such early detection possible for PostgreSQL
currently IMHO.

> BTW, if you do feel that a pure grammar check is worthwhile, you
> should look at the ecpg preprocessor, which does more or less that
> with the SQL portions of its input.  ecpg could be a better model
> to follow because it doesn't bring all the dependencies of the server
> and so is much more likely to appear in a client-side installation.
> It's kind of an ugly, unmaintained mess at the moment, sadly.
>
> The big knock on doing this client-side is that there might be
> version skew compared to the server you're using --- but if you
> are not doing anything beyond a grammar-level check then your
> results are pretty approximate anyway, ISTM.  We've not heard
> anything suggesting that version skew is a huge problem for
> ecpg users.

Thanks for the info, I'll check.

>                         regards, tom lane



walther@technowledgy.de writes:
> Tom Lane:
>> BTW, if you do feel that a pure grammar check is worthwhile, you
>> should look at the ecpg preprocessor, which does more or less that
>> with the SQL portions of its input.

> Would working with ecpg allow to get back a parse tree of the query to 
> do stuff with that?

No, ecpg isn't interested in building a syntax tree.

> This is really what is missing for the ecosystem. A libpqparser for 
> tools to use: Formatters, linters, query rewriters, simple syntax 
> checkers... they are all missing access to postgres' own parser.

To get to that, you'd need some kind of agreement on what the syntax
tree is.  I doubt our existing implementation would be directly useful
to very many tools, and even if it is, do they want to track constant
version-to-version changes?

            regards, tom lane



Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
Robert Haas
Date:
On Wed, May 15, 2024 at 2:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The thing that was bothering me most about this is that I don't
> understand why that's a useful check.  If I meant to type
>
>         UPDATE mytab SET mycol = 42;
>
> and instead I type
>
>         UPDATEE mytab SET mycol = 42;
>
> your proposed feature would catch that; great.  But if I type
>
>         UPDATE mytabb SET mycol = 42;
>
> it won't.  How does that make sense?  I'm not entirely sure where
> to draw the line about what a "syntax check" should catch, but this
> seems a bit south of what I'd want in a syntax-checking editor.

I don't agree with this, actually. The first wrong example can never
be valid, while the second one can be valid given the right table
definitions. That lines up quite nicely with the distinction between
parsing and parse analysis. There is a problem with actually getting
all the way there, I'm fairly sure, because we've got thousands of
lines of gram.y and thousands of lines of parse analysis code that
weren't all written with the idea of making a crisp distinction here.
For example, I'd like both EXPLAIN (WAFFLES) SELECT 1 and EXPLAIN
WAFFLES SELECT 1 to be flagged as syntactically invalid, and with
things as they are that would not happen. Even for plannable
statements I would not be at all surprised to hear that there are a
bunch of corner cases that we'd get wrong.

But I don't understand the idea that the concept doesn't make sense. I
think it is perfectly reasonable to imagine a world in which the
initial parsing takes care of reporting everything that can be
determined by static analysis without knowing anything about catalog
contents, and parse analysis checks all the things that require
catalog access, and you can run the first set of checks and then
decide whether to proceed further. I think if I were designing a new
system from scratch, I'd definitely want to set it up that way, and I
think moving our existing system in that direction would probably let
us clean up a variety of warts along the way. Really, the only
argument I see against it is that getting from where we are to there
would be a gigantic amount of work for the value we'd derive.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
Robert Haas
Date:
On Wed, May 15, 2024 at 2:12 PM Josef Šimánek <josef.simanek@gmail.com> wrote:
> I'm totally OK to mark this as rejected and indeed 45 lines were just
> minimal patch to create PoC (I have coded this during last PGConf.eu
> lunch break) and mainly to start discussion.

Thanks for understanding.

> I'm not sure everyone in this thread understands the reason for this
> patch, which is clearly my fault, since I have failed to explain. Main
> idea is to make a tool to validate query can be parsed, that's all.
> Similar to running "EXPLAIN query", but not caring about the result
> and not caring about the DB structure (ignoring missing tables, ...),
> just checking it was successfully executed. This definitely belongs to
> the server side and not to the client side, it is just a tool to
> validate that for this running PostgreSQL backend it is a "parseable"
> query.

I don't think it's at all obvious that this belongs on the server side
rather than the client side. I think it could be done in either place,
or both. I just think we don't have the infrastructure to do it
cleanly, at present.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
walther@technowledgy.de
Date:
Tom Lane:
>> This is really what is missing for the ecosystem. A libpqparser for
>> tools to use: Formatters, linters, query rewriters, simple syntax
>> checkers... they are all missing access to postgres' own parser.
> 
> To get to that, you'd need some kind of agreement on what the syntax
> tree is.  I doubt our existing implementation would be directly useful
> to very many tools, and even if it is, do they want to track constant
> version-to-version changes?

Correct, on top of what the syntax tree currently has, one would 
probably need:
- comments
- locations (line number / character) for everything, including those of 
comments

Otherwise it's impossible to print proper SQL again without losing 
information.

And then on top of that, to be really really useful, you'd need to be 
able to parse partial statements, too, to support all kinds of "language 
server" applications.

Tracking version-to-version changes is exactly the reason why it would 
be good to have that from upstream, imho. New syntax is added in 
(almost?) every release and everyone outside core trying to write their 
own parser and staying up2date with **all** the new syntax.. will 
eventually fail.

Yes, there could be changes to the produced parse tree as well and you'd 
also need to adjust, for example, your SQL-printers. But it should be 
easier to stay up2date than right now.

Best,

Wolfgang



Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, May 15, 2024 at 2:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The thing that was bothering me most about this is that I don't
>> understand why that's a useful check. ...

> But I don't understand the idea that the concept doesn't make sense.

Sorry: "make sense" was a poorly chosen phrase there.  What I was
doubting, and continue to doubt, is that 100% checking of what
you can check without catalog access and 0% checking of the rest
is a behavior that end users will find especially useful.

> I think it is perfectly reasonable to imagine a world in which the
> initial parsing takes care of reporting everything that can be
> determined by static analysis without knowing anything about catalog
> contents, and parse analysis checks all the things that require
> catalog access, and you can run the first set of checks and then
> decide whether to proceed further. I think if I were designing a new
> system from scratch, I'd definitely want to set it up that way, and I
> think moving our existing system in that direction would probably let
> us clean up a variety of warts along the way. Really, the only
> argument I see against it is that getting from where we are to there
> would be a gigantic amount of work for the value we'd derive.

I'm less enthusiatic about this than you are.  I think it would likely
produce a slower and less maintainable system.  Slower because we'd
need more passes over the query: what parse analysis does today would
have to be done in at least two separate steps.  Less maintainable
because knowledge about certain things would end up being more spread
around the system.  Taking your example of getting syntax checking to
recognize invalid EXPLAIN keywords: right now there's just one piece
of code that knows what those options are, but this'd require two.
Also, "run the first set of checks and then decide whether to proceed
further" seems like optimizing the system for broken queries over
valid ones, which I don't think is an appropriate goal.

Now, I don't say that there'd be *no* benefit to reorganizing the
system that way.  But it wouldn't be an unalloyed win, and so I
share your bottom line that the costs would be out of proportion
to the benefits.

            regards, tom lane



Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
"David G. Johnston"
Date:
On Wed, May 15, 2024 at 12:18 PM <walther@technowledgy.de> wrote:
Tom Lane:
>> This is really what is missing for the ecosystem. A libpqparser for
>> tools to use: Formatters, linters, query rewriters, simple syntax
>> checkers... they are all missing access to postgres' own parser.
>
> To get to that, you'd need some kind of agreement on what the syntax
> tree is.  I doubt our existing implementation would be directly useful
> to very many tools, and even if it is, do they want to track constant
> version-to-version changes?

Correct, on top of what the syntax tree currently has, one would
probably need:
- comments
- locations (line number / character) for everything, including those of
comments


I'm with the original patch idea at this point.  A utility that simply runs text through the parser, not parse analysis, and answers the question: "Were you able to parse this?" has both value and seems like something that can be patched into core in a couple of hundred lines, not thousands, as has already been demonstrated.

Sure, other questions are valid and other goals exist in the ecosystem, but that doesn't diminish this one which is sufficiently justified for my +1 on the idea.

Now, in my ideal world something like this could be made as an extension so that it can work on older versions and not have to be maintained by core.  And likely grow more features over time.  Is there anything fundamental about this that prevents it being implemented in an extension and, if so, what can we add to core in v18 to alleviate that limitation?

David J.

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
Josef Šimánek
Date:
st 15. 5. 2024 v 21:28 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Wed, May 15, 2024 at 2:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> The thing that was bothering me most about this is that I don't
> >> understand why that's a useful check. ...
>
> > But I don't understand the idea that the concept doesn't make sense.
>
> Sorry: "make sense" was a poorly chosen phrase there.  What I was
> doubting, and continue to doubt, is that 100% checking of what
> you can check without catalog access and 0% checking of the rest
> is a behavior that end users will find especially useful.

But that's completely different feature which is not exclusive and
shouldn't block this other feature to do only exactly as specified.

> > I think it is perfectly reasonable to imagine a world in which the
> > initial parsing takes care of reporting everything that can be
> > determined by static analysis without knowing anything about catalog
> > contents, and parse analysis checks all the things that require
> > catalog access, and you can run the first set of checks and then
> > decide whether to proceed further. I think if I were designing a new
> > system from scratch, I'd definitely want to set it up that way, and I
> > think moving our existing system in that direction would probably let
> > us clean up a variety of warts along the way. Really, the only
> > argument I see against it is that getting from where we are to there
> > would be a gigantic amount of work for the value we'd derive.
>
> I'm less enthusiatic about this than you are.  I think it would likely
> produce a slower and less maintainable system.  Slower because we'd
> need more passes over the query: what parse analysis does today would
> have to be done in at least two separate steps.  Less maintainable
> because knowledge about certain things would end up being more spread
> around the system.  Taking your example of getting syntax checking to
> recognize invalid EXPLAIN keywords: right now there's just one piece
> of code that knows what those options are, but this'd require two.
> Also, "run the first set of checks and then decide whether to proceed
> further" seems like optimizing the system for broken queries over
> valid ones, which I don't think is an appropriate goal.
>
> Now, I don't say that there'd be *no* benefit to reorganizing the
> system that way.  But it wouldn't be an unalloyed win, and so I
> share your bottom line that the costs would be out of proportion
> to the benefits.
>
>                         regards, tom lane



Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
Josef Šimánek
Date:
st 15. 5. 2024 v 21:33 odesílatel David G. Johnston
<david.g.johnston@gmail.com> napsal:
>
> On Wed, May 15, 2024 at 12:18 PM <walther@technowledgy.de> wrote:
>>
>> Tom Lane:
>> >> This is really what is missing for the ecosystem. A libpqparser for
>> >> tools to use: Formatters, linters, query rewriters, simple syntax
>> >> checkers... they are all missing access to postgres' own parser.
>> >
>> > To get to that, you'd need some kind of agreement on what the syntax
>> > tree is.  I doubt our existing implementation would be directly useful
>> > to very many tools, and even if it is, do they want to track constant
>> > version-to-version changes?
>>
>> Correct, on top of what the syntax tree currently has, one would
>> probably need:
>> - comments
>> - locations (line number / character) for everything, including those of
>> comments
>>
>
> I'm with the original patch idea at this point.  A utility that simply runs text through the parser, not parse
analysis,and answers the question: "Were you able to parse this?" has both value and seems like something that can be
patchedinto core in a couple of hundred lines, not thousands, as has already been demonstrated. 
>
> Sure, other questions are valid and other goals exist in the ecosystem, but that doesn't diminish this one which is
sufficientlyjustified for my +1 on the idea. 
>
> Now, in my ideal world something like this could be made as an extension so that it can work on older versions and
nothave to be maintained by core.  And likely grow more features over time.  Is there anything fundamental about this
thatprevents it being implemented in an extension and, if so, what can we add to core in v18 to alleviate that
limitation?

Like extension providing additional binary? Or what kind of extension
do you mean? One of the original ideas was to be able to do so (parse
query) without running postgres itself. Could extension be useful
without running postgres backend?

> David J.
>



Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
"David G. Johnston"
Date:
On Wed, May 15, 2024 at 12:35 PM Josef Šimánek <josef.simanek@gmail.com> wrote:
st 15. 5. 2024 v 21:33 odesílatel David G. Johnston
<david.g.johnston@gmail.com> napsal:

> Now, in my ideal world something like this could be made as an extension so that it can work on older versions and not have to be maintained by core.  And likely grow more features over time.  Is there anything fundamental about this that prevents it being implemented in an extension and, if so, what can we add to core in v18 to alleviate that limitation?

Like extension providing additional binary? Or what kind of extension
do you mean? One of the original ideas was to be able to do so (parse
query) without running postgres itself. Could extension be useful
without running postgres backend?


Pushing beyond my experience level here...but yes a separately installed binary (extension is being used conceptually here, this doesn't involve "create extension") that can inspect pg_config to find out where backend/postmaster library objects are installed and link to them.

David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> Now, in my ideal world something like this could be made as an extension so
> that it can work on older versions and not have to be maintained by core.
> And likely grow more features over time.  Is there anything fundamental
> about this that prevents it being implemented in an extension and, if so,
> what can we add to core in v18 to alleviate that limitation?

It'd be pretty trivial to create a function that takes a string
and runs it through raw_parser --- I've got such things laying
about for microbenchmarking purposes, in fact.  But the API that'd
present for tools such as editors is enormously different from
the proposed patch: there would need to be a running server and
they'd need to be able to log into it, plus there are more minor
concerns like having to wrap the string in valid quoting.

Now on the plus side, once you'd bought into that environment,
it'd be equally trivial to offer alternatives like "run raw
parsing and parse analysis, but don't run the query".  I continue
to maintain that that's the set of checks you'd really want in a
lot of use-cases.

            regards, tom lane



Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
Robert Haas
Date:
On Wed, May 15, 2024 at 3:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Sorry: "make sense" was a poorly chosen phrase there.  What I was
> doubting, and continue to doubt, is that 100% checking of what
> you can check without catalog access and 0% checking of the rest
> is a behavior that end users will find especially useful.

You might be right, but my guess is that you're wrong. Syntax
highlighting is very popular, and seems like a more sophisticated
version of that same concept. I don't personally like it or use it
myself, but I bet I'm hugely in the minority these days. And EDB
certainly gets customer requests for syntax checking of various kinds;
whether this particular kind would get more or less traction than
other things is somewhat moot in view of the low likelihood of it
actually happening.

> I'm less enthusiatic about this than you are.  I think it would likely
> produce a slower and less maintainable system.  Slower because we'd
> need more passes over the query: what parse analysis does today would
> have to be done in at least two separate steps.  Less maintainable
> because knowledge about certain things would end up being more spread
> around the system.  Taking your example of getting syntax checking to
> recognize invalid EXPLAIN keywords: right now there's just one piece
> of code that knows what those options are, but this'd require two.
> Also, "run the first set of checks and then decide whether to proceed
> further" seems like optimizing the system for broken queries over
> valid ones, which I don't think is an appropriate goal.

Well, we've talked before about other problems that stem from the fact
that DDL doesn't have a clear separation between parse analysis and
execution. I vaguely imagine that it would be valuable to clean that
up for various reasons. But I haven't really thought it through, so
I'm prepared to concede that it might have various downsides that
aren't presently obvious to me.

> Now, I don't say that there'd be *no* benefit to reorganizing the
> system that way.  But it wouldn't be an unalloyed win, and so I
> share your bottom line that the costs would be out of proportion
> to the benefits.

I'm glad we agree on that much, and don't feel a compelling need to
litigate the remaining differences between our positions, unless you
really want to. I'm just telling you what I think, and I'm pleased
that we think as similarly as we do, despite remaining differences.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
"David G. Johnston"
Date:
On Wed, May 15, 2024 at 1:00 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, May 15, 2024 at 3:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Sorry: "make sense" was a poorly chosen phrase there.  What I was
> doubting, and continue to doubt, is that 100% checking of what
> you can check without catalog access and 0% checking of the rest
> is a behavior that end users will find especially useful.

You might be right, but my guess is that you're wrong. Syntax
highlighting is very popular, and seems like a more sophisticated
version of that same concept.

The proposed seems distinctly less sophisticated though would be a starting point.

I think the documentation for --syntax check would read something like this:

postgres --syntax-check={filename | -}

Performs a pass over the lexical structure of the script supplied in filename or, if - is specified, standard input, then exits.  The exit code is zero if no errors were found, otherwise it is 1, and the errors, at full verbosity, are printed to standard error.  This does not involve reading the configuration file and, by extension, will not detect errors that require knowledge of a database schema, including the system catalogs, to manifest.  There will be no false positives, but due to the prior rule, false negatives must be factored into its usage.  Thus this option is most useful as an initial triage point, quickly rejecting SQL code without requiring a running PostgreSQL service.

Note: This is exposed as a convenient way to get access to the outcome of performing a raw parse within the specific version of the postgres binary being executed.  The specific implementation of that parse is still non-public.  Likewise, PostgreSQL doesn't itself have a use for a raw parse output beyond sending it to post-parse analysis.  All of the catalog required checks, and potentially some that don't obviously need the catalogs, happen in this post-parse step; which the syntax checking API does not expose.  In short, the API here doesn't include any guarantees regarding the specific errors one should expect to see output, only the no false positive test result of performing the first stage raw parse.

David J.

If in core I would still want to expose this as say a contrib module binary instead of hacking it into postgres.  It would be our first server program entry there.

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
"David G. Johnston"
Date:
On Wed, May 15, 2024 at 6:35 PM David G. Johnston <david.g.johnston@gmail.com> wrote:

If in core I would still want to expose this as say a contrib module binary instead of hacking it into postgres.  It would be our first server program entry there.


Sorry for self-reply but:

Maybe name it "pg_script_check" with a, for now mandatory, "--syntax-only" option that enables this raw parse mode.  Leaving room for executing this in an environment where there is, or it can launch, a running instance that then performs post-parse analysis as well.

David J.

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
Laurenz Albe
Date:
On Wed, 2024-05-15 at 14:39 -0400, Tom Lane wrote:
> The thing that was bothering me most about this is that I don't
> understand why that's a useful check.  If I meant to type
>
>     UPDATE mytab SET mycol = 42;
>
> and instead I type
>
>     UPDATEE mytab SET mycol = 42;
>
> your proposed feature would catch that; great.  But if I type
>
>     UPDATE mytabb SET mycol = 42;
>
> it won't.  How does that make sense?

It makes sense to me.  I see a clear distinction between "this is a
valid SQL statement" and "this is an SQL statement that will run on
a specific database with certain objects in it".

To me, "correct syntax" is the former.

Yours,
Laurenz Albe



Re: [PATCH] Add --syntax to postgres for SQL syntax checking

From
Peter Eisentraut
Date:
On 15.05.24 21:05, Robert Haas wrote:
> I don't think it's at all obvious that this belongs on the server side
> rather than the client side. I think it could be done in either place,
> or both. I just think we don't have the infrastructure to do it
> cleanly, at present.

I think if you're going to do a syntax-check-with-catalog-lookup mode, 
you need authentication and access control.  The mode without catalog 
lookup doesn't need that.  But it might be better to offer both modes 
through a similar interface (or at least plan ahead for that).