Thread: Extensibility of the PostgreSQL wire protocol
The following is a request for discussion and comments, not a refined proposal accompanied by a working patch.
As recently publicly announced Amazon Web Services is working on Babelfish, a set of extensions that will allow PostgreSQL to be compatible with other database systems. One part of this will be an extension that allows PostgreSQL to listen on a secondary port and process a different wire protocol. The first extension we are creating in this direction is handling of the Tabular Data Stream (TDS), used by Sybase and Microsoft SQL-Server databases. It is more efficient to build an extension, that can handle the TDS protocol inside the backend, than creating a proxy process that translates from TDS to libpq protocol and back.
Creating the necessary infrastructure in the postmaster and backend will open up more possibilities, that are not tied to our compatibility efforts. Possible use cases for wire protocol extensibility include the development of a completely new, not backwards compatible PostgreSQL protocol or extending the existing wire protocol with things like 3rd party connection pool specific features (like transfer of file descriptors between pool and working backend for example).
Our current plan is to create a new set of API calls and hooks that allow to register additional wire protocols. The existing backend libpq implementation will be modified to register itself using the new API. This will serve as a proof of concept as well as ensure that the API definition is not slanted towards a specific protocol. It is also similar to the way table access methods and compression methods are added.
A wire protocol extension will be a standard PostgreSQL dynamic loadable extension module. The wire protocol extensions to load will be listed in the shared_preload_libraries GUC. The extension's Init function will register a hook function to be called where the postmaster is currently creating the libpq server sockets. This hook callback will then create the server sockets and register them for monitoring via select(2) in the postmaster main loop, using a new API function. Part of the registration information are callback functions to invoke for accepting and authenticating incoming connections, error reporting as well as a function that will implement the TCOP loop for the protocol. Ongoing work on the TDS protocol has shown us that different protocols make it desirable to have separate implementations of the TCOP loop. The TCOP function will return only after the connection has been terminated. Fortunately half the interface already exists since the sending of result sets is implemented via callback functions that are registered as the dest receiver, which works pretty well in our current code.
Regards, Jan
--
As recently publicly announced Amazon Web Services is working on Babelfish, a set of extensions that will allow PostgreSQL to be compatible with other database systems. One part of this will be an extension that allows PostgreSQL to listen on a secondary port and process a different wire protocol. The first extension we are creating in this direction is handling of the Tabular Data Stream (TDS), used by Sybase and Microsoft SQL-Server databases. It is more efficient to build an extension, that can handle the TDS protocol inside the backend, than creating a proxy process that translates from TDS to libpq protocol and back.
Creating the necessary infrastructure in the postmaster and backend will open up more possibilities, that are not tied to our compatibility efforts. Possible use cases for wire protocol extensibility include the development of a completely new, not backwards compatible PostgreSQL protocol or extending the existing wire protocol with things like 3rd party connection pool specific features (like transfer of file descriptors between pool and working backend for example).
Our current plan is to create a new set of API calls and hooks that allow to register additional wire protocols. The existing backend libpq implementation will be modified to register itself using the new API. This will serve as a proof of concept as well as ensure that the API definition is not slanted towards a specific protocol. It is also similar to the way table access methods and compression methods are added.
A wire protocol extension will be a standard PostgreSQL dynamic loadable extension module. The wire protocol extensions to load will be listed in the shared_preload_libraries GUC. The extension's Init function will register a hook function to be called where the postmaster is currently creating the libpq server sockets. This hook callback will then create the server sockets and register them for monitoring via select(2) in the postmaster main loop, using a new API function. Part of the registration information are callback functions to invoke for accepting and authenticating incoming connections, error reporting as well as a function that will implement the TCOP loop for the protocol. Ongoing work on the TDS protocol has shown us that different protocols make it desirable to have separate implementations of the TCOP loop. The TCOP function will return only after the connection has been terminated. Fortunately half the interface already exists since the sending of result sets is implemented via callback functions that are registered as the dest receiver, which works pretty well in our current code.
Regards, Jan
Jan Wieck
Principal Database Engineer
Principal Database Engineer
Amazon Web Services
On Mon, Jan 25, 2021 at 10:07 AM Jan Wieck <jan@wi3ck.info> wrote:
The following is a request for discussion and comments, not a refined proposal accompanied by a working patch.
After implementing this three different ways inside the backend over the years, I landed on almost this identical approach for handling the MySQL, TDS, MongoDB, and Oracle protocols for NEXTGRES.
Initially, each was implemented as an background worker extension which had to handle its own networking, passing the fd off to new protocol-specific connections, etc. This worked, but duplicate a good amount of logic. It would be great to have a standard, loadable, way to add support for a new protocol.
Jonah H. Harris
Hi Jonah,
On Mon, Jan 25, 2021 at 10:18 AM Jonah H. Harris <jonah.harris@gmail.com> wrote:
On Mon, Jan 25, 2021 at 10:07 AM Jan Wieck <jan@wi3ck.info> wrote:The following is a request for discussion and comments, not a refined proposal accompanied by a working patch.After implementing this three different ways inside the backend over the years, I landed on almost this identical approach for handling the MySQL, TDS, MongoDB, and Oracle protocols for NEXTGRES.
Could any of that be open sourced? It would be an excellent addition to add one of those as example code.
Regards, Jan
--Initially, each was implemented as an background worker extension which had to handle its own networking, passing the fd off to new protocol-specific connections, etc. This worked, but duplicate a good amount of logic. It would be great to have a standard, loadable, way to add support for a new protocol.Jonah H. Harris
Jan Wieck
On Mon, Jan 25, 2021 at 10:07 AM Jan Wieck <jan@wi3ck.info> wrote: > Our current plan is to create a new set of API calls and hooks that allow to register additional wire protocols. The existingbackend libpq implementation will be modified to register itself using the new API. This will serve as a proof ofconcept as well as ensure that the API definition is not slanted towards a specific protocol. It is also similar to theway table access methods and compression methods are added. If we're going to end up with an open source implementation of something useful in contrib or whatever, then I think this is fine. But, if not, then we're just making it easier for Amazon to do proprietary stuff without getting any benefit for the open-source project. In fact, in that case PostgreSQL would ensure have to somehow ensure that the hooks don't get broken without having any code that actually uses them, so not only would the project get no benefit, but it would actually incur a small tax. I wouldn't say that's an absolutely show-stopper, but it definitely isn't my first choice. -- Robert Haas EDB: http://www.enterprisedb.com
>
> On Mon, Jan 25, 2021 at 10:07 AM Jan Wieck <jan@wi3ck.info> wrote:
> > Our current plan is to create a new set of API calls and hooks that allow to register additional wire protocols. The existing backend libpq implementation will be modified to register itself using the new API. This will serve as a proof of concept as well as ensure that the API definition is not slanted towards a specific protocol. It is also similar to the way table access methods and compression methods are added.
>
> If we're going to end up with an open source implementation of
> something useful in contrib or whatever, then I think this is fine.
> But, if not, then we're just making it easier for Amazon to do
> proprietary stuff without getting any benefit for the open-source
> project. In fact, in that case PostgreSQL would ensure have to somehow
> ensure that the hooks don't get broken without having any code that
> actually uses them, so not only would the project get no benefit, but
> it would actually incur a small tax. I wouldn't say that's an
> absolutely show-stopper, but it definitely isn't my first choice.
As far I understood Jan's proposal is to add enough hooks on PostgreSQL to enable us to extend the wire protocol and add a contrib module as an example (maybe TDS, HTTP or just adding new capabilities to current implementation).
Regards,
On Wed, Feb 10, 2021 at 11:43 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jan 25, 2021 at 10:07 AM Jan Wieck <jan@wi3ck.info> wrote:
> Our current plan is to create a new set of API calls and hooks that allow to register additional wire protocols. The existing backend libpq implementation will be modified to register itself using the new API. This will serve as a proof of concept as well as ensure that the API definition is not slanted towards a specific protocol. It is also similar to the way table access methods and compression methods are added.
If we're going to end up with an open source implementation of
something useful in contrib or whatever, then I think this is fine.
But, if not, then we're just making it easier for Amazon to do
proprietary stuff without getting any benefit for the open-source
project. In fact, in that case PostgreSQL would ensure have to somehow
ensure that the hooks don't get broken without having any code that
actually uses them, so not only would the project get no benefit, but
it would actually incur a small tax. I wouldn't say that's an
absolutely show-stopper, but it definitely isn't my first choice.
Agreed on adding substantial hooks if they're not likely to be used. While I haven't yet seen AWS' implementation or concrete proposal, given the people involved, I assume it's fairly similar to how I implemented it. Assuming that's correct and it doesn't require substantial redevelopment, I'd certainly open-source my MySQL-compatible protocol and parser implementation. From my perspective, it would be awesome if these could be done as extensions.
While I'm not planning to open source it as of yet, for my Oracle-compatible stuff, I don't think I'd be able to do anything other than the protocol as an extension given the core-related changes similar to what EDB has to do. I don't think there's any easy way to get around that. But, for the protocol and any type of simple translation to Postgres' dialect, I think that could easily be hook-based.
Jonah H. Harris
Robert Haas <robertmhaas@gmail.com> writes: > If we're going to end up with an open source implementation of > something useful in contrib or whatever, then I think this is fine. > But, if not, then we're just making it easier for Amazon to do > proprietary stuff without getting any benefit for the open-source > project. In fact, in that case PostgreSQL would ensure have to somehow > ensure that the hooks don't get broken without having any code that > actually uses them, so not only would the project get no benefit, but > it would actually incur a small tax. I wouldn't say that's an > absolutely show-stopper, but it definitely isn't my first choice. As others noted, a test module could be built to add some coverage here. What I'm actually more concerned about, in this whole line of development, is the follow-on requests that will surely occur to kluge up Postgres to make its behavior more like $whatever. As in "well, now that we can serve MySQL clients protocol-wise, can't we pretty please have a mode that makes the parser act more like MySQL". If we start having modes for MySQL identifier quoting, Oracle outer join syntax, yadda yadda, it's going to be way more of a maintenance nightmare than some hook functions. So if we accept any patch along this line, I want to drive a hard stake in the ground that the answer to that sort of thing will be NO. Assuming we're going to keep to that, though, it seems like people doing this sort of thing will inevitably end up with a fork anyway. So maybe we should just not bother with the first step either. regards, tom lane
On Wed, Feb 10, 2021 at 1:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
What I'm actually more concerned about, in this whole line of development,
is the follow-on requests that will surely occur to kluge up Postgres
to make its behavior more like $whatever. As in "well, now that we
can serve MySQL clients protocol-wise, can't we pretty please have a
mode that makes the parser act more like MySQL". If we start having
modes for MySQL identifier quoting, Oracle outer join syntax, yadda
yadda, it's going to be way more of a maintenance nightmare than some
hook functions. So if we accept any patch along this line, I want to
drive a hard stake in the ground that the answer to that sort of thing
will be NO.
Actually, a substantial amount can be done with hooks. For Oracle, which is substantially harder than MySQL, I have a completely separate parser that generates a PG-compatible parse tree packaged up as an extension. To handle autonomous transactions, database links, hierarchical query conversion, hints, and some execution-related items requires core changes. But, the protocol and parsing can definitely be done with hooks. And, as was mentioned previously, this isn't tied directly to emulating another database - it would enable us to support an HTTP-ish interface directly in the server as an extension as well. A lot of this can be done with background worker extensions now, which is how my stuff was primarily architected, but it's hacky when it comes to areas where the items Jan discussed could clean things up and make them more pluggable.
Assuming we're going to keep to that, though, it seems like people
doing this sort of thing will inevitably end up with a fork anyway.
So maybe we should just not bother with the first step either.
Perhaps I'm misunderstanding you, but I wouldn't throw this entire idea out (which enables a substantial addition of extensible functionality with a limited set of touchpoints) on the premise of future objections.
Jonah H. Harris
"Jonah H. Harris" <jonah.harris@gmail.com> writes: > On Wed, Feb 10, 2021 at 1:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> ... If we start having >> modes for MySQL identifier quoting, Oracle outer join syntax, yadda >> yadda, it's going to be way more of a maintenance nightmare than some >> hook functions. So if we accept any patch along this line, I want to >> drive a hard stake in the ground that the answer to that sort of thing >> will be NO. > Actually, a substantial amount can be done with hooks. For Oracle, which is > substantially harder than MySQL, I have a completely separate parser that > generates a PG-compatible parse tree packaged up as an extension. To handle > autonomous transactions, database links, hierarchical query conversion, > hints, and some execution-related items requires core changes. That is a spot-on definition of where I do NOT want to end up. Hooks everywhere and enormous extensions that break anytime we change anything in the core. It's not really clear that anybody is going to find that more maintainable than a straight fork, except to the extent that it enables the erstwhile forkers to shove some of their work onto the PG community. My feeling about this is if you want to use Oracle, go use Oracle. Don't ask PG to take on a ton of maintenance issues so you can have a frankenOracle. regards, tom lane
On Wed, Feb 10, 2021 at 2:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
That is a spot-on definition of where I do NOT want to end up. Hooks
everywhere and enormous extensions that break anytime we change anything
in the core. It's not really clear that anybody is going to find that
more maintainable than a straight fork, except to the extent that it
enables the erstwhile forkers to shove some of their work onto the PG
community.
Given the work over the last few major releases to make several other aspects of Postgres pluggable, how is implementing a pluggable protocol API any different?
To me, this sounds more like a philosophical disagreement with how people could potentially use Postgres than a technical one. My point is only that, using current PG functionality, I could equally write a pluggable storage interface for my Oracle and InnoDB data file readers/writers, which would similarly allow for the creation of a Postgres franken-Oracle by extension only.
I don't think anyone is asking for hooks for all the things I mentioned - a pluggable transaction manager, for example, doesn't make much sense. But, when it comes to having actually done this vs. posited about its usefulness, I'd say it has some merit and doesn't really introduce that much complexity or maintenance overhead to core - whether the extensions still work properly is up to the extension authors... isn't that the whole point of extensions?
Jonah H. Harris
On Wed, Feb 10, 2021 at 11:43 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jan 25, 2021 at 10:07 AM Jan Wieck <jan@wi3ck.info> wrote:
> Our current plan is to create a new set of API calls and hooks that allow to register additional wire protocols. The existing backend libpq implementation will be modified to register itself using the new API. This will serve as a proof of concept as well as ensure that the API definition is not slanted towards a specific protocol. It is also similar to the way table access methods and compression methods are added.
If we're going to end up with an open source implementation of
something useful in contrib or whatever, then I think this is fine.
But, if not, then we're just making it easier for Amazon to do
proprietary stuff without getting any benefit for the open-source
project. In fact, in that case PostgreSQL would ensure have to somehow
ensure that the hooks don't get broken without having any code that
actually uses them, so not only would the project get no benefit, but
it would actually incur a small tax. I wouldn't say that's an
absolutely show-stopper, but it definitely isn't my first choice.
At this very moment there are several parts to this. One is the hooks to make wire protocols into loadable modules, which is what this effort is about. Another is the TDS protocol as it is being implemented for Babelfish and third is the Babelfish extension itself. Both will require additional hooks and APIs I am not going to address here. I consider them not material to my effort.
As for making the wire protocol itself expandable I really see a lot of potential outside of what Amazon wants here. And I would not be advertising it if it would be for Babelfish alone. As I laid out, just the ability for a third party to add additional messages for special connection pool support would be enough to make it useful. There also have been discussions in the JDBC subproject to combine certain messages into one single message. Why not allow the JDBC project to develop their own, JDBC-optimized backend side? Last but not least, what would be wrong with listening for MariaDB clients?
I am planning on a follow up project to this, demoting libpq itself to just another loadable protocol. Just the way procedural languages are all on the same level because that is how I developed the loadable, procedural language handler all those years ago.
I am planning on a follow up project to this, demoting libpq itself to just another loadable protocol. Just the way procedural languages are all on the same level because that is how I developed the loadable, procedural language handler all those years ago.
Considering how spread out and quite frankly unorganized our wire protocol handling is, this is not a small order.
Regards, Jan
--
Robert Haas
EDB: http://www.enterprisedb.com
Jan Wieck
On Wed, Feb 10, 2021 at 2:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > That is a spot-on definition of where I do NOT want to end up. Hooks > everywhere and enormous extensions that break anytime we change anything > in the core. It's not really clear that anybody is going to find that > more maintainable than a straight fork, except to the extent that it > enables the erstwhile forkers to shove some of their work onto the PG > community. +1. Making the lexer and parser extensible seems desirable to me. It would be beneficial not only for companies like EDB and Amazon that might want to extend the grammar in various ways, but also for extension authors. However, it's vastly harder than Jan's proposal to make the wire protocol pluggable. The wire protocol is pretty well-isolated from the rest of the system. As long as you can get queries out of the packets the client sends and package up the results to send back, it's all good. The parser, on the other hand, is not at all well-isolated from the rest of the system. There's a LOT of code that knows a whole lot of stuff about the structure of parse trees, so your variant parser can't produce parse trees for new kinds of DDL, or for new query constructs. And if it parsed some completely different syntax where, say, joins were not explicit, it would still have to figure out how to represent them in a way that looked just like it came out of the regular parser -- otherwise, parse analysis and query planning and so forth are not going to work, unless you go and change a lot of other code too, and I don't really have any idea how we could solve that, even in theory. But that kind of thing just isn't a problem for the proposal on this thread. That being said, I'm not in favor of transferring maintenance work to the community for this set of hooks any more than I am for something on the parsing side. In general, I'm in favor of as much extensibility as we can reasonably create, but with a complicated proposal like this one, the community should expect to be able to get something out of it. And so far what I hear Jan saying is that these hooks could in theory be used for things other than Amazon's proprietary efforts and those things could in theory bring benefits to the community, but there are no actual plans to do anything with this that would benefit anyone other than Amazon. Which seems to bring us right back to expecting the community to maintain things for the benefit of third-party forks. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Feb 11, 2021 at 9:28 AM Robert Haas <robertmhaas@gmail.com> wrote:
That being said, I'm not in favor of transferring maintenance work to
the community for this set of hooks any more than I am for something
on the parsing side. In general, I'm in favor of as much extensibility
as we can reasonably create, but with a complicated proposal like this
one, the community should expect to be able to get something out of
it. And so far what I hear Jan saying is that these hooks could in
theory be used for things other than Amazon's proprietary efforts and
those things could in theory bring benefits to the community, but
there are no actual plans to do anything with this that would benefit
anyone other than Amazon. Which seems to bring us right back to
expecting the community to maintain things for the benefit of
third-party forks.
I'm quite sure I said I'd open source my MySQL implementation, which allows Postgres to appear to MySQL clients as a MySQL/MariaDB server. This is neither proprietary nor Amazon-related and makes Postgres substantially more useful for a large number of applications.
As Jan said in his last email, they're not proposing all the different aspects needed. In fact, nothing has actually been proposed yet. This is an entirely philosophical debate. I don't even know what's being proposed at this point - I just know it *could* be useful. Let's just wait and see what is actually proposed before shooting it down, yes?
Jonah H. Harris
On Thu, Feb 11, 2021 at 9:42 AM Jonah H. Harris <jonah.harris@gmail.com> wrote: > I'm quite sure I said I'd open source my MySQL implementation, which allows Postgres to appear to MySQL clients as a MySQL/MariaDBserver. This is neither proprietary nor Amazon-related and makes Postgres substantially more useful for a largenumber of applications. OK. There's stuff to think about there, too: do we want that in contrib? Is it in good enough shape to be in contrib even if we did? If it's not in contrib, how do we incorporate it into, say, the buildfarm, so that we know if we break something? Is it actively maintained and stable, so that if it needs adjustment for upstream changes we can count on that getting addressed in a timely fashion? I don't know the answers to these questions and am not trying to prejudge, but I think they are important and relevant questions. > As Jan said in his last email, they're not proposing all the different aspects needed. In fact, nothing has actually beenproposed yet. This is an entirely philosophical debate. I don't even know what's being proposed at this point - I justknow it *could* be useful. Let's just wait and see what is actually proposed before shooting it down, yes? I don't think I'm trying to shoot anything down, because as I said, I like extensibility and am generally in favor of it. Rather, I'm expressing a concern which seems to me to be justified, based on what was posted. I'm sorry that my tone seems to have aggravated you, but it wasn't intended to do so. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Feb 11, 2021 at 9:42 AM Jonah H. Harris <jonah.harris@gmail.com> wrote: >> As Jan said in his last email, they're not proposing all the different >> aspects needed. In fact, nothing has actually been proposed yet. This >> is an entirely philosophical debate. I don't even know what's being >> proposed at this point - I just know it *could* be useful. Let's just >> wait and see what is actually proposed before shooting it down, yes? > I don't think I'm trying to shoot anything down, because as I said, I > like extensibility and am generally in favor of it. Rather, I'm > expressing a concern which seems to me to be justified, based on what > was posted. I'm sorry that my tone seems to have aggravated you, but > it wasn't intended to do so. Likewise, the point I was trying to make is that a "pluggable wire protocol" is only a tiny part of what would be needed to have a credible MySQL, Oracle, or whatever clone. There are large semantic differences from those products; there are maintenance issues arising from the fact that we whack structures like parse trees around all the time; and so on. Maybe there is some useful thing that can be accomplished here, but we need to consider the bigger picture rather than believing (without proof) that a few hook variables will be enough to do anything. regards, tom lane
On 2/11/21 10:06 AM, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Feb 11, 2021 at 9:42 AM Jonah H. Harris <jonah.harris@gmail.com> wrote: >>> As Jan said in his last email, they're not proposing all the different >>> aspects needed. In fact, nothing has actually been proposed yet. This >>> is an entirely philosophical debate. I don't even know what's being >>> proposed at this point - I just know it *could* be useful. Let's just >>> wait and see what is actually proposed before shooting it down, yes? >> I don't think I'm trying to shoot anything down, because as I said, I >> like extensibility and am generally in favor of it. Rather, I'm >> expressing a concern which seems to me to be justified, based on what >> was posted. I'm sorry that my tone seems to have aggravated you, but >> it wasn't intended to do so. > Likewise, the point I was trying to make is that a "pluggable wire > protocol" is only a tiny part of what would be needed to have a credible > MySQL, Oracle, or whatever clone. There are large semantic differences > from those products; there are maintenance issues arising from the fact > that we whack structures like parse trees around all the time; and so on. > Maybe there is some useful thing that can be accomplished here, but we > need to consider the bigger picture rather than believing (without proof) > that a few hook variables will be enough to do anything. Yeah. I think we'd need a fairly fully worked implementation to see where it goes. Is Amazon going to release (under TPL) its TDS implementation of this? That might go a long way to convincing me this is worth considering. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Thu, Feb 11, 2021 at 10:29 AM Andrew Dunstan <andrew@dunslane.net> wrote:
On 2/11/21 10:06 AM, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Feb 11, 2021 at 9:42 AM Jonah H. Harris <jonah.harris@gmail.com> wrote:
>>> As Jan said in his last email, they're not proposing all the different
>>> aspects needed. In fact, nothing has actually been proposed yet. This
>>> is an entirely philosophical debate. I don't even know what's being
>>> proposed at this point - I just know it *could* be useful. Let's just
>>> wait and see what is actually proposed before shooting it down, yes?
>> I don't think I'm trying to shoot anything down, because as I said, I
>> like extensibility and am generally in favor of it. Rather, I'm
>> expressing a concern which seems to me to be justified, based on what
>> was posted. I'm sorry that my tone seems to have aggravated you, but
>> it wasn't intended to do so.
> Likewise, the point I was trying to make is that a "pluggable wire
> protocol" is only a tiny part of what would be needed to have a credible
> MySQL, Oracle, or whatever clone. There are large semantic differences
> from those products; there are maintenance issues arising from the fact
> that we whack structures like parse trees around all the time; and so on.
> Maybe there is some useful thing that can be accomplished here, but we
> need to consider the bigger picture rather than believing (without proof)
> that a few hook variables will be enough to do anything.
Yeah. I think we'd need a fairly fully worked implementation to see
where it goes. Is Amazon going to release (under TPL) its TDS
implementation of this? That might go a long way to convincing me this
is worth considering.
Everything is planned to be released under the Apache 2.0 license so people are free to do with it as they choose.
On Wed, Feb 10, 2021 at 11:04 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Jonah H. Harris" <jonah.harris@gmail.com> writes:
> On Wed, Feb 10, 2021 at 1:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ... If we start having
>> modes for MySQL identifier quoting, Oracle outer join syntax, yadda
>> yadda, it's going to be way more of a maintenance nightmare than some
>> hook functions. So if we accept any patch along this line, I want to
>> drive a hard stake in the ground that the answer to that sort of thing
>> will be NO.
> Actually, a substantial amount can be done with hooks. For Oracle, which is
> substantially harder than MySQL, I have a completely separate parser that
> generates a PG-compatible parse tree packaged up as an extension. To handle
> autonomous transactions, database links, hierarchical query conversion,
> hints, and some execution-related items requires core changes.
That is a spot-on definition of where I do NOT want to end up. Hooks
everywhere and enormous extensions that break anytime we change anything
in the core. It's not really clear that anybody is going to find that
more maintainable than a straight fork, except to the extent that it
enables the erstwhile forkers to shove some of their work onto the PG
community.
My feeling about this is if you want to use Oracle, go use Oracle.
Don't ask PG to take on a ton of maintenance issues so you can have
a frankenOracle.
PostgreSQL over the last decade spent a considerable amount of time allowing it to become extensible outside of core. We are now useful in workloads nobody would have considered in 2004 or 2008.
The more extensibility we add, the LESS we maintain. It is a lot easier to maintain an API than it is an entire kernel. When I look at all the interesting features coming from the ecosystem, they are all built on the hooks that this community worked so hard to create. This idea is an extension of that and a result of the community's success.
The more extensible we make PostgreSQL, the more the hacker community can innovate without damaging the PostgreSQL reputation as a rock solid database system.
Features like these only enable the entire community to innovate. Is the real issue that the more extensible PostgreSQL is, the more boring it will become?
JD
The more extensibility we add, the LESS we maintain. It is a lot easier to maintain an API than it is an entire kernel. When I look at all the interesting features coming from the ecosystem, they are all built on the hooks that this community worked so hard to create. This idea is an extension of that and a result of the community's success.
The more extensible we make PostgreSQL, the more the hacker community can innovate without damaging the PostgreSQL reputation as a rock solid database system.
Features like these only enable the entire community to innovate. Is the real issue that the more extensible PostgreSQL is, the more boring it will become?
regards, tom lane
On Thu, Feb 11, 2021 at 12:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Thu, Feb 11, 2021 at 9:42 AM Jonah H. Harris <jonah.harris@gmail.com> wrote:
> >> As Jan said in his last email, they're not proposing all the different
> >> aspects needed. In fact, nothing has actually been proposed yet. This
> >> is an entirely philosophical debate. I don't even know what's being
> >> proposed at this point - I just know it *could* be useful. Let's just
> >> wait and see what is actually proposed before shooting it down, yes?
>
> > I don't think I'm trying to shoot anything down, because as I said, I
> > like extensibility and am generally in favor of it. Rather, I'm
> > expressing a concern which seems to me to be justified, based on what
> > was posted. I'm sorry that my tone seems to have aggravated you, but
> > it wasn't intended to do so.
>
> Likewise, the point I was trying to make is that a "pluggable wire
> protocol" is only a tiny part of what would be needed to have a credible
> MySQL, Oracle, or whatever clone. There are large semantic differences
> from those products; there are maintenance issues arising from the fact
> that we whack structures like parse trees around all the time; and so on.
> Maybe there is some useful thing that can be accomplished here, but we
> need to consider the bigger picture rather than believing (without proof)
> that a few hook variables will be enough to do anything.
>
Just to don't miss the point, creating a compat protocol to mimic others (TDS,
MySQL, etc) is just one use case.
There are other use cases to make wire protocol extensible, for example for
telemetry I can use some hooks to propagate context [1] and get more detailed
tracing information about the negotiation between frontend and backend and
being able to implement a truly query tracing tool, for example.
Another use case is extending the current protocol to, for example, send more
information about query execution on CommandComplete command instead of
just the number of affected rows.
About the HTTP protocol I think PG should have it, maybe pure HTTP (no REST,
just HTTP) because it's the most interoperable. Performance can still be very good
with HTTP2, and you have a huge ecosystem of tools and proxies (like Envoy) that
would do wonders with this. You could safely query a db from a web page (passing
through proxies that would do auth, TLS, etc). Or maybe a higher performing gRPC
version (which is also HTTP2 and is amazing), but this makes it a bit more difficult
to query from a web page. In either case, context propagation is already built-in, and
On Thu, 11 Feb 2021 at 09:28, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Feb 10, 2021 at 2:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> That is a spot-on definition of where I do NOT want to end up. Hooks
> everywhere and enormous extensions that break anytime we change anything
> in the core. It's not really clear that anybody is going to find that
> more maintainable than a straight fork, except to the extent that it
> enables the erstwhile forkers to shove some of their work onto the PG
> community.
+1.
Making the lexer and parser extensible seems desirable to me. It would
be beneficial not only for companies like EDB and Amazon that might
want to extend the grammar in various ways, but also for extension
authors. However, it's vastly harder than Jan's proposal to make the
wire protocol pluggable. The wire protocol is pretty well-isolated
from the rest of the system. As long as you can get queries out of the
packets the client sends and package up the results to send back, it's
all good.
I would have to disagree that the wire protocol is well-isolated. Sending and receiving are not in a single file
The codes are not even named constants so trying to find a specific one is difficult.
Anything that would clean this up would be a benefit
That being said, I'm not in favor of transferring maintenance work to
the community for this set of hooks any more than I am for something
on the parsing side. In general, I'm in favor of as much extensibility
as we can reasonably create, but with a complicated proposal like this
one, the community should expect to be able to get something out of
it. And so far what I hear Jan saying is that these hooks could in
theory be used for things other than Amazon's proprietary efforts and
those things could in theory bring benefits to the community, but
there are no actual plans to do anything with this that would benefit
anyone other than Amazon. Which seems to bring us right back to
expecting the community to maintain things for the benefit of
third-party forks.
if this proposal brought us the ability stream results that would be a huge plus!
Dave Cramer
www.postgres.rocks
Attached are a first patch and a functioning extension that implements a telnet protocol server.
It is incomplete in that it doesn't address things like the COPY protocol. But it is enough to give a more detailed idea of what this interface will look like and what someone would do to implement their own protocol or extend an existing one.
The extension needs to be loaded via shared_preload_libraries and configured for a port number and listen_addresses as follows:
shared_preload_libraries = 'telnet_srv'
shared_preload_libraries = 'telnet_srv'
telnet_srv.listen_addresses = '*'
telnet_srv.port = 54323
It is incomplete in that it doesn't address things like the COPY protocol. But it is enough to give a more detailed idea of what this interface will look like and what someone would do to implement their own protocol or extend an existing one.
The overall idea here is to route all functions, that communicate with the frontend, through function pointers that hang off of MyProcPort. Since we are performing socket communication in them I believe one extra function pointer indirection is unlikely to have significant performance impact.
Best Regards, Jan
On behalf of Amazon Web Services
On behalf of Amazon Web Services
On Sun, Feb 14, 2021 at 12:36 PM Dave Cramer <davecramer@postgres.rocks> wrote:
On Thu, 11 Feb 2021 at 09:28, Robert Haas <robertmhaas@gmail.com> wrote:On Wed, Feb 10, 2021 at 2:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> That is a spot-on definition of where I do NOT want to end up. Hooks
> everywhere and enormous extensions that break anytime we change anything
> in the core. It's not really clear that anybody is going to find that
> more maintainable than a straight fork, except to the extent that it
> enables the erstwhile forkers to shove some of their work onto the PG
> community.
+1.
Making the lexer and parser extensible seems desirable to me. It would
be beneficial not only for companies like EDB and Amazon that might
want to extend the grammar in various ways, but also for extension
authors. However, it's vastly harder than Jan's proposal to make the
wire protocol pluggable. The wire protocol is pretty well-isolated
from the rest of the system. As long as you can get queries out of the
packets the client sends and package up the results to send back, it's
all good.I would have to disagree that the wire protocol is well-isolated. Sending and receiving are not in a single fileThe codes are not even named constants so trying to find a specific one is difficult.Anything that would clean this up would be a benefitThat being said, I'm not in favor of transferring maintenance work to
the community for this set of hooks any more than I am for something
on the parsing side. In general, I'm in favor of as much extensibility
as we can reasonably create, but with a complicated proposal like this
one, the community should expect to be able to get something out of
it. And so far what I hear Jan saying is that these hooks could in
theory be used for things other than Amazon's proprietary efforts and
those things could in theory bring benefits to the community, but
there are no actual plans to do anything with this that would benefit
anyone other than Amazon. Which seems to bring us right back to
expecting the community to maintain things for the benefit of
third-party forks.if this proposal brought us the ability stream results that would be a huge plus!Dave Cramerwww.postgres.rocks
Jan Wieck
Attachment
On Thu, Feb 18, 2021 at 9:32 PM Jan Wieck <jan@wi3ck.info> wrote: > And, here is how it looks with the following configuration: telnet_srv.port = 1433 telnet_srv.listen_addresses = '*' telnet localhost 1433 master Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. PostgreSQL Telnet Interface database name: postgres username: kuntal password: changeme > select 1; ?column? ---- 1 SELECT 1 > select 1/0; Message: ERROR - division by zero Few comments in the extension code (although experimental): 1. In telnet_srv.c, + static int pe_port; .. + DefineCustomIntVariable("telnet_srv.port", + "Telnet server port.", + NULL, + &pe_port, + pe_port, + 1024, + 65536, + PGC_POSTMASTER, + 0, + NULL, + NULL, + NULL); The variable pe_port should be initialized to a value which is > 1024 and < 65536. Otherwise, the following assert will fail, TRAP: FailedAssertion("newval >= conf->min", File: "guc.c", Line: 5541, PID: 12100) 2. The function pq_putbytes shouldn't be used by anyone other than old-style COPY out. + pq_putbytes(msg, strlen(msg)); Otherwise, the following assert will fail in the same function: /* Should only be called by old-style COPY OUT */ Assert(DoingCopyOut); -- Thanks & Regards, Kuntal Ghosh Amazon Web Services
> On 11 Feb 2021, at 16:06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Maybe there is some useful thing that can be accomplished here, but we > need to consider the bigger picture rather than believing (without proof) > that a few hook variables will be enough to do anything. > > regards, tom lane > Pluggable wire protocol is a game-changer on its own. The bigger picture is that a right protocol choice enables large-scale architectural simplifications for whole classes ofproduction applications. For browser-based applications (lob, saas, e-commerce), having the database server speak the browser protocol enables architectureswithout backend application code. This in turn leads to significant reductions of latency, complexity, and applicationdevelopment time. And it’s not just lack of backend code: one also profits from all the existing infrastructurelike per-query compression/format choice, browser connection management, sse, multiple streams, prioritization,caching/cdns, etc. Don’t know if you’d consider it as a proof, yet I am seeing 2x to 4x latency reduction in production applications from protocolconversion to http/2. My present solution is a simple connection pooler I built on top of Nginx transforming thetcp stream as it passes through. In a recent case, letting the browser talk directly to the database allowed me to get rid of a ~100k-sloc .net backend andall the complexity and infrastructure that goes with coding/testing/deploying/maintaining it, while keeping all the positives:per-query compression/data conversion, querying multiple databases over a single connection, session cookies, etc.Deployment is trivial compared to what was before. Latency is down 2x-4x across the board. Having some production experience with this approach, I can see how http/2-speaking Postgres would further reduce latency,processing cost, and time-to-interaction for applications. A similar case can be made for IoT where one would want to plug an iot-optimized protocol. Again, most of the benefit ispossible with a protocol-converting proxy, but there are additional non-trivial performance gains to be had if the databaseserver speaks the right protocol. While not the only use cases, I’d venture a guess these represent a sizable chunk of what Postgres is used for today, andwill be used even more for, so the positive impact of a pluggable protocol would be significant. -- Damir
Thank you Kuntal,
On Fri, Feb 19, 2021 at 4:36 AM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
On Thu, Feb 18, 2021 at 9:32 PM Jan Wieck <jan@wi3ck.info> wrote:
Few comments in the extension code (although experimental):
1. In telnet_srv.c,
+ static int pe_port;
..
+ DefineCustomIntVariable("telnet_srv.port",
+ "Telnet server port.",
+ NULL,
+ &pe_port,
+ pe_port,
+ 1024,
+ 65536,
+ PGC_POSTMASTER,
+ 0,
+ NULL,
+ NULL,
+ NULL);
The variable pe_port should be initialized to a value which is > 1024
and < 65536. Otherwise, the following assert will fail,
TRAP: FailedAssertion("newval >= conf->min", File: "guc.c", Line:
5541, PID: 12100)
Right, forgot to turn on Asserts.
2. The function pq_putbytes shouldn't be used by anyone other than
old-style COPY out.
+ pq_putbytes(msg, strlen(msg));
Otherwise, the following assert will fail in the same function:
/* Should only be called by old-style COPY OUT */
Assert(DoingCopyOut);
I would argue that the Assert needs to be changed. It is obvious that the Assert in place is meant to guard against direct usage of pg_putbytes() in an attempt to force all code to use pq_putmessage() instead. This is good when speaking libpq wire protocol since all messages there are prefixed with a one byte message type. It does not apply to other protocols.
I propose to create another global boolean IsNonLibpqFrontend which the protocol extension will set to true when accepting the connection and the above then will change to
Assert(DoingCopyOut || IsNonLibpqFrontend);
Regards, Jan
--
Thanks & Regards,
Kuntal Ghosh
Amazon Web Services
Jan Wieck
On 19/02/2021 14:29, Damir Simunic wrote: > >> On 11 Feb 2021, at 16:06, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> Maybe there is some useful thing that can be accomplished here, but >> we need to consider the bigger picture rather than believing >> (without proof) that a few hook variables will be enough to do >> anything. > > Pluggable wire protocol is a game-changer on its own. > > The bigger picture is that a right protocol choice enables > large-scale architectural simplifications for whole classes of > production applications. > > For browser-based applications (lob, saas, e-commerce), having the > database server speak the browser protocol enables architectures > without backend application code. This in turn leads to significant > reductions of latency, complexity, and application development time. > And it’s not just lack of backend code: one also profits from all the > existing infrastructure like per-query compression/format choice, > browser connection management, sse, multiple streams, prioritization, > caching/cdns, etc. > > Don’t know if you’d consider it as a proof, yet I am seeing 2x to 4x > latency reduction in production applications from protocol conversion > to http/2. My present solution is a simple connection pooler I built > on top of Nginx transforming the tcp stream as it passes through. I can see value in supporting different protocols. I don't like the approach discussed in this thread, however. For example, there has been discussion elsewhere about integrating connection pooling into the server itself. For that, you want to have a custom process that listens for incoming connections, and launches backends independently of the incoming connections. These hooks would not help with that. Similarly, if you want to integrate a web server into the database server, you probably also want some kind of connection pooling. A one-to-one relationship between HTTP connections and backend processes doesn't seem nice. With the hooks that exist today, would it possible to write a background worker that listens on a port, instead of postmaster? Can you launch backends from a background worker? And communicate the backend processes using a shared memory message queue (see pqmq.c). I would recommend this approach: write a separate program that sits between the client and PostgreSQL, speaking custom protocol to the client, and libpq to the backend. And then move that program into a background worker process. > In a recent case, letting the browser talk directly to the database > allowed me to get rid of a ~100k-sloc .net backend and all the > complexity and infrastructure that goes with > coding/testing/deploying/maintaining it, while keeping all the > positives: per-query compression/data conversion, querying multiple > databases over a single connection, session cookies, etc. Deployment > is trivial compared to what was before. Latency is down 2x-4x across > the board. Querying multiple databases over a single connection is not possible with the approach taken here. Not sure about the others things you listed. - Heikki
On Fri, Feb 19, 2021 at 8:48 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
With the hooks that exist today, would it possible to write a background
worker that listens on a port, instead of postmaster? Can you launch
backends from a background worker? And communicate the backend processes
using a shared memory message queue (see pqmq.c).
Yes. That's similar to how mine work: A background worker that acts as a listener for the new protocol which then sets up a new dynamic background worker on accept(), waits for its creation, passes the fd to the new background worker, and sits in a while (!got_sigterm) loop checking the socket for activity and running the protocol similar to postmaster. I haven't looked at the latest connection pooling patches but, in general, connection pooling is an abstract issue and should be usable for any type of connection as, realistically, it's just an event loop and state problem - it shouldn't be protocol specific.
I would recommend this approach: write a separate program that sits
between the client and PostgreSQL, speaking custom protocol to the
client, and libpq to the backend. And then move that program into a
background worker process.
Doing protocol conversion between libpq and a different protocol works, but is slow. My implementations were originally all proxies that worked outside the database, then I moved them inside, then I replaced all the libpq code with SPI-related calls.
> In a recent case, letting the browser talk directly to the database
> allowed me to get rid of a ~100k-sloc .net backend and all the
> complexity and infrastructure that goes with
> coding/testing/deploying/maintaining it, while keeping all the
> positives: per-query compression/data conversion, querying multiple
> databases over a single connection, session cookies, etc. Deployment
> is trivial compared to what was before. Latency is down 2x-4x across
> the board.
Querying multiple databases over a single connection is not possible
with the approach taken here. Not sure about the others things you listed.
Accessing multiple databases from the same backend is problematic overall - I didn't solve that in my implementations either. IIRC, once a bgworker is attached to a specific database, it's basically stuck with that database.
Jonah H. Harris
On 2/19/21 8:48 AM, Heikki Linnakangas wrote: > I can see value in supporting different protocols. I don't like the > approach discussed in this thread, however. > > For example, there has been discussion elsewhere about integrating > connection pooling into the server itself. For that, you want to have a > custom process that listens for incoming connections, and launches > backends independently of the incoming connections. These hooks would > not help with that. The two are not mutually exclusive. You are right that the current proposal would not help with that type of built in connection pool, but it may be extended to that. Give the function, that postmaster is calling to accept a connection when a server_fd is ready, a return code that it can use to tell postmaster "forget about it, don't fork or do anything else with it". This function is normally calling StreamConnection() before the postmaster then forks the backend. But it could instead hand over the socket to the pool background worker (I presume Jonah is transferring them from process to process via UDP packet). The pool worker is then launching the actual backends which receive a requesting client via the same socket transfer to perform one or more transactions, then hand the socket back to the pool worker. All of that would still require a protocol extension that has special messages for "here is a client socket for you" and "you can have that back". > I would recommend this approach: write a separate program that sits > between the client and PostgreSQL, speaking custom protocol to the > client, and libpq to the backend. And then move that program into a > background worker process. That is a classic protocol converting proxy. It has been done in the past with not really good results, both performance wise as with respect to protocol completeness. Regards, Jan -- Jan Wieck Principle Database Engineer Amazon Web Services
> On 19 Feb 2021, at 14:48, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > For example, there has been discussion elsewhere about integrating connection pooling into the server itself. For that,you want to have a custom process that listens for incoming connections, and launches backends independently of theincoming connections. These hooks would not help with that. > Not clear how the connection polling in the core is linked to discussing pluggable wire protocols. > Similarly, if you want to integrate a web server into the database server, you probably also want some kind of connectionpooling. A one-to-one relationship between HTTP connections and backend processes doesn't seem nice. > HTTP/2 is just a protocol, not unlike fe/be that has a one-to-one relationship to backend processes as it stands. It shuttlesdata back and forth in query/response exchanges, and happens to be used by web servers and web browsers, among otherthings. My mentioning of it was simply an example I can speak of from experience, as opposed to speculating. Could havebrought up any other wire protocol if I had experience with it, say MQTT. To make it clear, “a pluggable wire protocol” as discussed here is a set of rules that defines how data is transmitted: whatthe requests and responses are, and how is the data laid out on the wire, what to do in case of error, etc. Nothing todo with a web server; why would one want to integrate it in the database, anyway? The intended contribution to the discussion of big picture of pluggable wire protocols is that there are significant usecases where the protocol choice is restricted on the client side, and allowing a pluggable wire protocol on the serverside brings tangible benefits in performance and architectural simplification. That’s all. The rest were supportingfacts that hopefully can also serve as a counterpoint to “pluggable wire protocol is primarily useful to make Postgrespretend to be Mysql." Protocol conversion HTTP/2<—>FE/BE on the connection pooler already brings a lot of the mentioned benefits, and I’m satisfiedwith it. Beyond that I’m simply supporting the idea of pluggable protocols as experience so far allows me to seeadvantages that might sound theoretical to someone who never tried this scenario in production. Glad to offer a couple of examples where I see potential for performance gains for having such a wire protocol pluggablein the core. Let me know if you want me to elaborate. > Querying multiple databases over a single connection is not possible with the approach taken here. Indeed, querying multiple databases over a single connection is something you need a proxy for and a different client protocolfrom fe/be. No need to mix that with the talk about pluggable wire protocol. My mentioning of it was in the sense “a lot of LoB backend code is nothing more than a bloated protocol converter that happensto also allow connecting to multiple databases from a single client connection => letting the client speak to thedatabase [trough a proxy in this case] removed the bloated source of latency but kept the advantages.” -- Damir