Thread: Extending PostgreSQL with a Domain-Specific Language (DSL) -Development
Dear Hackers
I am interested in implementing my own Domain Specific Language (DSL) using PostgreSQL internals. Originally, the plan was not to use PostgreSQL and I had developed a grammar and used ANTLRv4 for parser work and general early development.
Initially, I was hoping for a scenario where I could have PostgreSQL's parser to change grammar (e.g. SET parser_language=SQL vs. SET parser_language=myDSL) in which case my ANTLRv4 project would override the PostgreSQL parser module. I guess another direction that my project could take is to extend PostgreSQL's SQL parser to factor in my DSL keywords and requirements.
To make matters more complicated, this version of ANTLR does not support code generation to C, but it does support generation to C++. Integrating the generated C++ code requires making it friendly to PostgreSQL e.g. using Plain Old Data Structures as described here https://www.postgresql.org/docs/9.0/extend-cpp.html, which seems to be suggesting to me that I may be using the wrong approach towards my goal.
I would be grateful if anyone could provide any general advice or pointers regarding my approach, for example regarding development effort, so that the development with PostgreSQL internals can be smooth and of a high quality. Maybe somebody has come across another DSL attempt which used PostgreSQL and that I could follow as a reference?
Thanks in advance.
Best,
Tom
On Fri, Jul 05, 2019 at 07:55:15AM +0000, Tom Mercha wrote: >Dear Hackers > >I am interested in implementing my own Domain Specific Language (DSL) >using PostgreSQL internals. Originally, the plan was not to use >PostgreSQL and I had developed a grammar and used ANTLRv4 for parser >work and general early development. > >Initially, I was hoping for a scenario where I could have PostgreSQL's >parser to change grammar (e.g. SET parser_language=SQL vs. SET >parser_language=myDSL) in which case my ANTLRv4 project would override >the PostgreSQL parser module. I guess another direction that my project >could take is to extend PostgreSQL's SQL parser to factor in my DSL >keywords and requirements. > >To make matters more complicated, this version of ANTLR does not >support code generation to C, but it does support generation to C++. >Integrating the generated C++ code requires making it friendly to >PostgreSQL e.g. using Plain Old Data Structures as described here >https://www.postgresql.org/docs/9.0/extend-cpp.html, which seems to be >suggesting to me that I may be using the wrong approach towards my >goal. > >I would be grateful if anyone could provide any general advice or >pointers regarding my approach, for example regarding development >effort, so that the development with PostgreSQL internals can be smooth >and of a high quality. Maybe somebody has come across another DSL >attempt which used PostgreSQL and that I could follow as a reference? > I might be missing something, but it seems like you intend to replace the SQL grammar we have with something else. It's not clear to me what would be the point of doing that, and it definitely looks like a huge amount of work - e.g. we don't have any support for switching between two distinct grammars the way you envision, and just that alone seems like a multi-year project. And if you don't have that capability, all external tools kinda stop working. Good luck with running such database. What I'd look at first is implementing the grammar as a procedural language (think PL/pgSQL, pl/perl etc.) implementing whatever you expect from your DSL. And it's not like you'd have to wrap everything in functions, because we have anonymous DO blocks. So you could do: DO LANGUAGE mydsl $$ ... whatever my dsl allows ... $$; It's still a fair amount of code to implement this (both the PL handler and the DSL implementation), but it's orders of magnitude simpler than what you described. See https://www.postgresql.org/docs/current/plhandler.html for details about how to write a language handler. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I might be missing something, but it seems like you intend to replace
the SQL grammar we have with something else. It's not clear to me what
would be the point of doing that, and it definitely looks like a huge
amount of work - e.g. we don't have any support for switching between
two distinct grammars the way you envision, and just that alone seems
like a multi-year project. And if you don't have that capability, allexternal tools kinda stop working. Good luck with running such database.
I was considering having two distinct grammars as an option - thanks for indicating the effort involved. At the end of the day I want both my DSL and the PostgreSQL grammars to coexist. Is extending PostgreSQL's grammar with my own through the PostgreSQL extension infrastructure worth consideration or is it also difficult to develop? Could you suggest any reference material on this topic?
What I'd look at first is implementing the grammar as a procedural
language (think PL/pgSQL, pl/perl etc.) implementing whatever you expect
from your DSL. And it's not like you'd have to wrap everything in
functions, because we have anonymous DO blocks.
Thanks for pointing out this direction! I think I will indeed adopt this approach especially if directly extending PostgreSQL grammar would be difficult.
Regards
Tom
From: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Sent: 05 July 2019 20:48
To: Tom Mercha
Cc: pgsql-hackers@postgresql.org
Subject: Re: Extending PostgreSQL with a Domain-Specific Language (DSL) - Development
Sent: 05 July 2019 20:48
To: Tom Mercha
Cc: pgsql-hackers@postgresql.org
Subject: Re: Extending PostgreSQL with a Domain-Specific Language (DSL) - Development
On Fri, Jul 05, 2019 at 07:55:15AM +0000, Tom Mercha wrote:
>Dear Hackers
>
>I am interested in implementing my own Domain Specific Language (DSL)
>using PostgreSQL internals. Originally, the plan was not to use
>PostgreSQL and I had developed a grammar and used ANTLRv4 for parser
>work and general early development.
>
>Initially, I was hoping for a scenario where I could have PostgreSQL's
>parser to change grammar (e.g. SET parser_language=SQL vs. SET
>parser_language=myDSL) in which case my ANTLRv4 project would override
>the PostgreSQL parser module. I guess another direction that my project
>could take is to extend PostgreSQL's SQL parser to factor in my DSL
>keywords and requirements.
>
>To make matters more complicated, this version of ANTLR does not
>support code generation to C, but it does support generation to C++.
>Integrating the generated C++ code requires making it friendly to
>PostgreSQL e.g. using Plain Old Data Structures as described here
>https://www.postgresql.org/docs/9.0/extend-cpp.html, which seems to be
>suggesting to me that I may be using the wrong approach towards my
>goal.
>
>I would be grateful if anyone could provide any general advice or
>pointers regarding my approach, for example regarding development
>effort, so that the development with PostgreSQL internals can be smooth
>and of a high quality. Maybe somebody has come across another DSL
>attempt which used PostgreSQL and that I could follow as a reference?
>
I might be missing something, but it seems like you intend to replace
the SQL grammar we have with something else. It's not clear to me what
would be the point of doing that, and it definitely looks like a huge
amount of work - e.g. we don't have any support for switching between
two distinct grammars the way you envision, and just that alone seems
like a multi-year project. And if you don't have that capability, all
external tools kinda stop working. Good luck with running such database.
What I'd look at first is implementing the grammar as a procedural
language (think PL/pgSQL, pl/perl etc.) implementing whatever you expect
from your DSL. And it's not like you'd have to wrap everything in
functions, because we have anonymous DO blocks. So you could do:
DO LANGUAGE mydsl $$
... whatever my dsl allows ...
$$;
It's still a fair amount of code to implement this (both the PL handler
and the DSL implementation), but it's orders of magnitude simpler than
what you described.
See https://www.postgresql.org/docs/current/plhandler.html for details
about how to write a language handler.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>Dear Hackers
>
>I am interested in implementing my own Domain Specific Language (DSL)
>using PostgreSQL internals. Originally, the plan was not to use
>PostgreSQL and I had developed a grammar and used ANTLRv4 for parser
>work and general early development.
>
>Initially, I was hoping for a scenario where I could have PostgreSQL's
>parser to change grammar (e.g. SET parser_language=SQL vs. SET
>parser_language=myDSL) in which case my ANTLRv4 project would override
>the PostgreSQL parser module. I guess another direction that my project
>could take is to extend PostgreSQL's SQL parser to factor in my DSL
>keywords and requirements.
>
>To make matters more complicated, this version of ANTLR does not
>support code generation to C, but it does support generation to C++.
>Integrating the generated C++ code requires making it friendly to
>PostgreSQL e.g. using Plain Old Data Structures as described here
>https://www.postgresql.org/docs/9.0/extend-cpp.html, which seems to be
>suggesting to me that I may be using the wrong approach towards my
>goal.
>
>I would be grateful if anyone could provide any general advice or
>pointers regarding my approach, for example regarding development
>effort, so that the development with PostgreSQL internals can be smooth
>and of a high quality. Maybe somebody has come across another DSL
>attempt which used PostgreSQL and that I could follow as a reference?
>
I might be missing something, but it seems like you intend to replace
the SQL grammar we have with something else. It's not clear to me what
would be the point of doing that, and it definitely looks like a huge
amount of work - e.g. we don't have any support for switching between
two distinct grammars the way you envision, and just that alone seems
like a multi-year project. And if you don't have that capability, all
external tools kinda stop working. Good luck with running such database.
What I'd look at first is implementing the grammar as a procedural
language (think PL/pgSQL, pl/perl etc.) implementing whatever you expect
from your DSL. And it's not like you'd have to wrap everything in
functions, because we have anonymous DO blocks. So you could do:
DO LANGUAGE mydsl $$
... whatever my dsl allows ...
$$;
It's still a fair amount of code to implement this (both the PL handler
and the DSL implementation), but it's orders of magnitude simpler than
what you described.
See https://www.postgresql.org/docs/current/plhandler.html for details
about how to write a language handler.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
First of all, it's pretty difficult to follow the discussion when it's not clear what's the original message and what's the response. E-mail clients generally indent the original message with '>' or someting like that, but your client does not do that (which is pretty silly). And copying the message at the top does not really help. Please do something about that. On Fri, Jul 05, 2019 at 09:37:03PM +0000, Tom Mercha wrote: >>I might be missing something, but it seems like you intend to replace >>the SQL grammar we have with something else. It's not clear to me what >>would be the point of doing that, and it definitely looks like a huge >>amount of work - e.g. we don't have any support for switching between >>two distinct grammars the way you envision, and just that alone seems >>like a multi-year project. And if you don't have that capability, all >>external tools kinda stop working. Good luck with running such database. > >I was considering having two distinct grammars as an option - thanks >for indicating the effort involved. At the end of the day I want both >my DSL and the PostgreSQL grammars to coexist. Is extending >PostgreSQL's grammar with my own through the PostgreSQL extension >infrastructure worth consideration or is it also difficult to develop? >Could you suggest any reference material on this topic? > Well, I'm not an expert in that area, but we currently don't have any infrastructure to support that. It's a topic that was discussed in the past (perhaps you can find some references in the archives) and it generally boils down to: 1) We're using bison as parser generator. 2) Bison does not allow adding rules on the fly. So you have to modify the in-core src/backend/parser/gram.y and rebuild postgres. See for example for an example of such discussion https://www.postgresql.org/message-id/flat/CABSN6VeeEhwb0HrjOCp9kHaWm0Ljbnko5y-0NKsT_%3D5i5C2jog%40mail.gmail.com When two of the smartest people on the list say it's a hard problem, it probably is. Particularly for someone who does not know the internals. >>What I'd look at first is implementing the grammar as a procedural >>language (think PL/pgSQL, pl/perl etc.) implementing whatever you >>expect from your DSL. And it's not like you'd have to wrap everything >>in functions, because we have anonymous DO blocks. > >Thanks for pointing out this direction! I think I will indeed adopt >this approach especially if directly extending PostgreSQL grammar would >be difficult. Well, it's the only way to deal with it at the moment. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 06/07/2019 00:06, Tomas Vondra wrote: > First of all, it's pretty difficult to follow the discussion when it's > not clear what's the original message and what's the response. E-mail > clients generally indent the original message with '>' or someting like > that, but your client does not do that (which is pretty silly). And > copying the message at the top does not really help. Please do something > about that. I would like to apologise. I did not realize that my client was doing that and now I have changed the client. I hope it's fine now. > > On Fri, Jul 05, 2019 at 09:37:03PM +0000, Tom Mercha wrote: >>> I might be missing something, but it seems like you intend to replace >>> the SQL grammar we have with something else. It's not clear to me what >>> would be the point of doing that, and it definitely looks like a huge >>> amount of work - e.g. we don't have any support for switching between >>> two distinct grammars the way you envision, and just that alone seems >>> like a multi-year project. And if you don't have that capability, all >>> external tools kinda stop working. Good luck with running such database. >> >> I was considering having two distinct grammars as an option - thanks >> for indicating the effort involved. At the end of the day I want both >> my DSL and the PostgreSQL grammars to coexist. Is extending >> PostgreSQL's grammar with my own through the PostgreSQL extension >> infrastructure worth consideration or is it also difficult to develop? >> Could you suggest any reference material on this topic? >> > > Well, I'm not an expert in that area, but we currently don't have any > infrastructure to support that. It's a topic that was discussed in the > past (perhaps you can find some references in the archives) and it > generally boils down to: > > 1) We're using bison as parser generator. > 2) Bison does not allow adding rules on the fly. > > So you have to modify the in-core src/backend/parser/gram.y and rebuild > postgres. See for example for an example of such discussion > > https://www.postgresql.org/message-id/flat/CABSN6VeeEhwb0HrjOCp9kHaWm0Ljbnko5y-0NKsT_%3D5i5C2jog%40mail.gmail.com > > > When two of the smartest people on the list say it's a hard problem, it > probably is. Particularly for someone who does not know the internals. You are right. Thanks for bringing it to my attention! I didn't design my language for interaction with triggers and whatnot, but I think that it would be very interesting to support those as well, so looking at CREATE LANGUAGE functionality is actually exciting and appropriate once I make some changes in design. Thanks again for this point! I hope this is not off topic but I was wondering if you know what are the intrinsic differences between HANDLER and INLINE parameters of CREATE LANGUAGE? I know that they are functions which are invoked at different instances of time (e.g. one is for handling anonymous code blocks), but at the end of the day they seem to have the same purpose? >>> What I'd look at first is implementing the grammar as a procedural >>> language (think PL/pgSQL, pl/perl etc.) implementing whatever you >>> expect from your DSL. And it's not like you'd have to wrap everything >>> in functions, because we have anonymous DO blocks. >> >> Thanks for pointing out this direction! I think I will indeed adopt >> this approach especially if directly extending PostgreSQL grammar would >> be difficult. > > Well, it's the only way to deal with it at the moment. > > > regards >
On Sun, Jul 07, 2019 at 11:06:38PM +0000, Tom Mercha wrote: >On 06/07/2019 00:06, Tomas Vondra wrote: >> First of all, it's pretty difficult to follow the discussion when it's >> not clear what's the original message and what's the response. E-mail >> clients generally indent the original message with '>' or someting like >> that, but your client does not do that (which is pretty silly). And >> copying the message at the top does not really help. Please do something >> about that. > >I would like to apologise. I did not realize that my client was doing >that and now I have changed the client. I hope it's fine now. > Thanks, seems fine now. >> >> On Fri, Jul 05, 2019 at 09:37:03PM +0000, Tom Mercha wrote: >>>> I might be missing something, but it seems like you intend to replace >>>> the SQL grammar we have with something else. It's not clear to me what >>>> would be the point of doing that, and it definitely looks like a huge >>>> amount of work - e.g. we don't have any support for switching between >>>> two distinct grammars the way you envision, and just that alone seems >>>> like a multi-year project. And if you don't have that capability, all >>>> external tools kinda stop working. Good luck with running such database. >>> >>> I was considering having two distinct grammars as an option - thanks >>> for indicating the effort involved. At the end of the day I want both >>> my DSL and the PostgreSQL grammars to coexist. Is extending >>> PostgreSQL's grammar with my own through the PostgreSQL extension >>> infrastructure worth consideration or is it also difficult to develop? >>> Could you suggest any reference material on this topic? >>> >> >> Well, I'm not an expert in that area, but we currently don't have any >> infrastructure to support that. It's a topic that was discussed in the >> past (perhaps you can find some references in the archives) and it >> generally boils down to: >> >> 1) We're using bison as parser generator. >> 2) Bison does not allow adding rules on the fly. >> >> So you have to modify the in-core src/backend/parser/gram.y and rebuild >> postgres. See for example for an example of such discussion >> >> https://www.postgresql.org/message-id/flat/CABSN6VeeEhwb0HrjOCp9kHaWm0Ljbnko5y-0NKsT_%3D5i5C2jog%40mail.gmail.com >> >> >> When two of the smartest people on the list say it's a hard problem, it >> probably is. Particularly for someone who does not know the internals. >You are right. Thanks for bringing it to my attention! > >I didn't design my language for interaction with triggers and whatnot, >but I think that it would be very interesting to support those as well, >so looking at CREATE LANGUAGE functionality is actually exciting and >appropriate once I make some changes in design. Thanks again for this point! > ;-) >I hope this is not off topic but I was wondering if you know what are >the intrinsic differences between HANDLER and INLINE parameters of >CREATE LANGUAGE? I know that they are functions which are invoked at >different instances of time (e.g. one is for handling anonymous code >blocks), but at the end of the day they seem to have the same purpose? > I've never written any PL handler, so I don't know. All I know is this quote from the docs, right below the simple example of PL handler: Only a few thousand lines of code have to be added instead of the dots to complete the call handler. I suppose the best idea to start an implementation is to copy an existing PL implementation, and modify that. That's usually much easier than starting from scratch, because you have something that works. Not sure if PL/pgSQL is the right choice though, perhaps pick some other language from https://wiki.postgresql.org/wiki/PL_Matrix regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 09/07/2019 23:22, Tomas Vondra wrote: > On Sun, Jul 07, 2019 at 11:06:38PM +0000, Tom Mercha wrote: >> On 06/07/2019 00:06, Tomas Vondra wrote: >>> First of all, it's pretty difficult to follow the discussion when it's >>> not clear what's the original message and what's the response. E-mail >>> clients generally indent the original message with '>' or someting like >>> that, but your client does not do that (which is pretty silly). And >>> copying the message at the top does not really help. Please do something >>> about that. >> >> I would like to apologise. I did not realize that my client was doing >> that and now I have changed the client. I hope it's fine now. >> > > Thanks, seems fine now. > >>> >>> On Fri, Jul 05, 2019 at 09:37:03PM +0000, Tom Mercha wrote: >>>>> I might be missing something, but it seems like you intend to replace >>>>> the SQL grammar we have with something else. It's not clear to me what >>>>> would be the point of doing that, and it definitely looks like a huge >>>>> amount of work - e.g. we don't have any support for switching between >>>>> two distinct grammars the way you envision, and just that alone seems >>>>> like a multi-year project. And if you don't have that capability, all >>>>> external tools kinda stop working. Good luck with running such >>>>> database. >>>> >>>> I was considering having two distinct grammars as an option - thanks >>>> for indicating the effort involved. At the end of the day I want both >>>> my DSL and the PostgreSQL grammars to coexist. Is extending >>>> PostgreSQL's grammar with my own through the PostgreSQL extension >>>> infrastructure worth consideration or is it also difficult to develop? >>>> Could you suggest any reference material on this topic? >>>> >>> >>> Well, I'm not an expert in that area, but we currently don't have any >>> infrastructure to support that. It's a topic that was discussed in the >>> past (perhaps you can find some references in the archives) and it >>> generally boils down to: >>> >>> 1) We're using bison as parser generator. >>> 2) Bison does not allow adding rules on the fly. >>> >>> So you have to modify the in-core src/backend/parser/gram.y and rebuild >>> postgres. See for example for an example of such discussion >>> >>> https://www.postgresql.org/message-id/flat/CABSN6VeeEhwb0HrjOCp9kHaWm0Ljbnko5y-0NKsT_%3D5i5C2jog%40mail.gmail.com >>> >>> >>> >>> When two of the smartest people on the list say it's a hard problem, it >>> probably is. Particularly for someone who does not know the internals. >> You are right. Thanks for bringing it to my attention! >> >> I didn't design my language for interaction with triggers and whatnot, >> but I think that it would be very interesting to support those as well, >> so looking at CREATE LANGUAGE functionality is actually exciting and >> appropriate once I make some changes in design. Thanks again for this >> point! >> > > ;-) > >> I hope this is not off topic but I was wondering if you know what are >> the intrinsic differences between HANDLER and INLINE parameters of >> CREATE LANGUAGE? I know that they are functions which are invoked at >> different instances of time (e.g. one is for handling anonymous code >> blocks), but at the end of the day they seem to have the same purpose? >> > > I've never written any PL handler, so I don't know. All I know is this > quote from the docs, right below the simple example of PL handler: > > Only a few thousand lines of code have to be added instead of the > dots to complete the call handler. > > I suppose the best idea to start an implementation is to copy an > existing PL implementation, and modify that. That's usually much easier > than starting from scratch, because you have something that works. Not > sure if PL/pgSQL is the right choice though, perhaps pick some other > language from https://wiki.postgresql.org/wiki/PL_Matrix > I understand that you never wrote any PL handler but was just thinking about this functionality as a follow-up to our conversation. I was just wondering whether anonymous DO blocks *must* return void or not? The docs for DO say it is a function returning void - https://www.postgresql.org/docs/current/sql-do.html But the docs for CREATE LANGUAGE's INLINE HANDLER say 'typically return void' - https://www.postgresql.org/docs/current/sql-createlanguage.html Is the implication that we can make the DO block return something somehow? I would be quite interested if there is a way of achieving this kind of functionality. My experiments using an SRF, which I have written, within an anonymous DO block just gives me an "ERROR: set-valued function called in context that cannot accept a set". Anyway maybe I'm going off on a tangent here... perhaps it is better to open a new thread? > > regards >
Re: Extending PostgreSQL with a Domain-Specific Language (DSL) - Development
From
"David G. Johnston"
Date:
On Tue, Jul 9, 2019 at 5:23 PM Tom Mercha <mercha_t@hotmail.com> wrote:
I understand that you never wrote any PL handler but was just thinking
about this functionality as a follow-up to our conversation. I was just
wondering whether anonymous DO blocks *must* return void or not?
The docs for DO say it is a function returning void -
https://www.postgresql.org/docs/current/sql-do.html
But the docs for CREATE LANGUAGE's INLINE HANDLER say 'typically return
void' - https://www.postgresql.org/docs/current/sql-createlanguage.html
No, the language cannot override the SQL execution environment's limitations.
"The code block is treated as though it were the body of a function with no parameters, returning void. It is parsed and executed a single time."
The above applies regardless of the language the code block is written in.
It can, however, affect permanent session state (so, use tables).
David J.
On 10/07/2019 02:31, David G. Johnston wrote: > On Tue, Jul 9, 2019 at 5:23 PM Tom Mercha <mercha_t@hotmail.com> wrote: > >> >> I understand that you never wrote any PL handler but was just thinking >> about this functionality as a follow-up to our conversation. I was just >> wondering whether anonymous DO blocks *must* return void or not? >> >> The docs for DO say it is a function returning void - >> https://www.postgresql.org/docs/current/sql-do.html > > > >> > > But the docs for CREATE LANGUAGE's INLINE HANDLER say 'typically return >> void' - https://www.postgresql.org/docs/current/sql-createlanguage.html > > > No, the language cannot override the SQL execution environment's > limitations. > > "The code block is treated as though it were the body of a function with no > parameters, returning void. It is parsed and executed a single time." > > The above applies regardless of the language the code block is written in. > > It can, however, affect permanent session state (so, use tables). > Thank you very much for addressing the question. I am still a bit of a novice with PostgreSQL internals. Could you please provide some more detail on your comment regarding affecting permanent session state? > David J. >
Re: Extending PostgreSQL with a Domain-Specific Language (DSL) - Development
From
"David G. Johnston"
Date:
On Tue, Jul 9, 2019 at 5:43 PM Tom Mercha <mercha_t@hotmail.com> wrote:
I am still a bit of a novice with PostgreSQL internals. Could you please
provide some more detail on your comment regarding affecting permanent
session state?
I was not referring to internals.
CREATE TEMP TABLE tempdo (id int);
DO $$
BEGIN
INSERT INTO tempdo VALUES (1);
END;
$$;
SELECT * FROM tempdo;
ROLLBACK;
David J.