Thread: Proposal to add --single-row to psql
psql currently collects the query result rows in memory before writing them to a file and can cause out of memory problems for large results in low memory environments like ec2. I can't use COPY TO STDOUT or FETCH_COUNT since I'm using Redshift and it doesn't support [writing to STDOUT](http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html) or [CURSOR](https://forums.aws.amazon.com/thread.jspa?threadID=122664).
[Single Row Mode is available in Postgres 9.2](http://www.postgresql.org/docs/9.2/static/libpq-single-row-mode.html) but [it doesn't look like](http://www.postgresql.org/docs/9.2/static/app-psql.html) you can tell psql to use single row mode when writing to a file (using --output).
I'm proposing to add a --single-row option to psql that would allow the result rows of a query to be streamed to a file without collecting them in memory first.
I'm new to the postgres source, but I was considering doing this by adding an elseif at [this line in bin/psql/common.c](https://github.com/postgres/postgres/blob/master/src/bin/psql/common.c#L955) that would call [PQsetSingleRowMode](https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/fe-exec.c#L1581) and ideally use something very similar to [ExecQueryUsingCursor](https://github.com/postgres/postgres/blob/master/src/bin/psql/common.c#L1081)
Please let me know if that would be an acceptable addition and if there's anything in particular I should be aware of when adding the feature.
Thank you,
Christopher
Hello
It is redundant to current FETCH_COUNT implementation, so I don't see sense to use it together. Maybe we can drop FETCH_COUNT and replace it by --single-row mode and probably it can simplify code.2013/4/23 Christopher Manning <c@christophermanning.org>
psql currently collects the query result rows in memory before writing them to a file and can cause out of memory problems for large results in low memory environments like ec2. I can't use COPY TO STDOUT or FETCH_COUNT since I'm using Redshift and it doesn't support [writing to STDOUT](http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html) or [CURSOR](https://forums.aws.amazon.com/thread.jspa?threadID=122664).[Single Row Mode is available in Postgres 9.2](http://www.postgresql.org/docs/9.2/static/libpq-single-row-mode.html) but [it doesn't look like](http://www.postgresql.org/docs/9.2/static/app-psql.html) you can tell psql to use single row mode when writing to a file (using --output).I'm proposing to add a --single-row option to psql that would allow the result rows of a query to be streamed to a file without collecting them in memory first.I'm new to the postgres source, but I was considering doing this by adding an elseif at [this line in bin/psql/common.c](https://github.com/postgres/postgres/blob/master/src/bin/psql/common.c#L955) that would call [PQsetSingleRowMode](https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/fe-exec.c#L1581) and ideally use something very similar to [ExecQueryUsingCursor](https://github.com/postgres/postgres/blob/master/src/bin/psql/common.c#L1081)Please let me know if that would be an acceptable addition and if there's anything in particular I should be aware of when adding the feature.Thank you,Christopher
Christopher Manning <c@christophermanning.org> writes: > I'm proposing to add a --single-row option to psql that would allow the > result rows of a query to be streamed to a file without collecting them in > memory first. Isn't there already a way to set FETCH_COUNT from the command line? (ie, I think there's a generic variable-assignment facility that could do this) regards, tom lane
<div dir="ltr"><br />On Tue, Apr 23, 2013 at 1:05 PM, Tom Lane <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>wrote:<br />><br />> Isn't there already a way to set FETCH_COUNTfrom the command line?<br /> > (ie, I think there's a generic variable-assignment facility that could<br />>do this)<br />><br /><br />Christopher,<br /><br />Tom is all right... from psql [1] command line we can do that:<br/><br />$ bin/psql --variable=FETCH_COUNT=100<br /> psql (9.3devel)<br />Type "help" for help.<br /><br />fabrizio=#\echo :FETCH_COUNT<br />100<br /><br />Regards,<br /><br />[1] <a href="http://www.postgresql.org/docs/current/interactive/app-psql.html#AEN84903">http://www.postgresql.org/docs/current/interactive/app-psql.html#AEN84903</a><br /><br/>-- <br />Fabrízio de Royes Mello<br />Consultoria/Coaching PostgreSQL<br />>> Blog sobre TI: <a href="http://fabriziomello.blogspot.com">http://fabriziomello.blogspot.com</a><br/>>> Perfil Linkedin: <a href="http://br.linkedin.com/in/fabriziomello">http://br.linkedin.com/in/fabriziomello</a><br/> >> Twitter: <a href="http://twitter.com/fabriziomello">http://twitter.com/fabriziomello</a><br/></div>
Fabrízio and Tom,
I know that you can use --variable="FETCH_COUNT=10000" from the psql command line, but internally that uses a CURSOR to batch the rows and [Redshift doesn't support CURSOR](https://forums.aws.amazon.com/thread.jspa?threadID=122664&tstart=0) so it's not an option when using psql to download data from Redshift.
Pavel's idea of having a single row mode option to replace FETCH_COUNT is interesting, does anyone have any problems with that or alternative ideas?
Regards,
Christopher
On Wed, Apr 24, 2013 at 7:04 AM, Fabrízio de Royes Mello <fabriziomello@gmail.com> wrote:
Christopher,
On Tue, Apr 23, 2013 at 1:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Isn't there already a way to set FETCH_COUNT from the command line?
> (ie, I think there's a generic variable-assignment facility that could
> do this)
>
Tom is all right... from psql [1] command line we can do that:
$ bin/psql --variable=FETCH_COUNT=100
psql (9.3devel)
Type "help" for help.
fabrizio=# \echo :FETCH_COUNT
100
Regards,
[1] http://www.postgresql.org/docs/current/interactive/app-psql.html#AEN84903
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
Christopher Manning <c@christophermanning.org> writes: > Fabr�zio and Tom, > I know that you can use --variable="FETCH_COUNT=10000" from the > psql command line, but internally that uses a CURSOR to batch the rows and > [Redshift doesn't support CURSOR]( > https://forums.aws.amazon.com/thread.jspa?threadID=122664&tstart=0) so it's > not an option when using psql to download data from Redshift. I don't know what redshift is, but I have very little patience with the idea of burdening psql with yet another random feature in order to work around deficiencies in somebody else's software. Considering that the earliest any such thing could reach the field would be 9.4, it seems not unlikely that the need for it would be gone by next year anyway. regards, tom lane
On 04/25/2013 10:42 AM, Tom Lane wrote: > Christopher Manning <c@christophermanning.org> writes: >> Fabrízio and Tom, >> I know that you can use --variable="FETCH_COUNT=10000" from the >> psql command line, but internally that uses a CURSOR to batch the rows and >> [Redshift doesn't support CURSOR]( >> https://forums.aws.amazon.com/thread.jspa?threadID=122664&tstart=0) so it's >> not an option when using psql to download data from Redshift. > I don't know what redshift is, It's an Amazon product based on release 8.0, but with many many features removed (e.g. Indexes!) > but I have very little patience with the > idea of burdening psql with yet another random feature in order to work > around deficiencies in somebody else's software. Considering that the > earliest any such thing could reach the field would be 9.4, it seems not > unlikely that the need for it would be gone by next year anyway. > > Plus there is the fact that we have no way to test it against redshift anyway. It should be up to Amazon to produce a useful psql program that works with redshift, not us. We have enough to do to support our own product. If this is to be justified at all it needs to be without reference to redshift. cheers andrew
On 04/25/2013 07:42 AM, Tom Lane wrote: > Christopher Manning <c@christophermanning.org> writes: >> Fabr�zio and Tom, >> I know that you can use --variable="FETCH_COUNT=10000" from the >> psql command line, but internally that uses a CURSOR to batch the rows and >> [Redshift doesn't support CURSOR]( >> https://forums.aws.amazon.com/thread.jspa?threadID=122664&tstart=0) so it's >> not an option when using psql to download data from Redshift. > > I don't know what redshift is, It is a PostgreSQL fork based on ancient source code. From Amazon: Amazon Redshift is based on PostgreSQL 8.0.2. Amazon Redshift and PostgreSQL have a number of very important differences that you must be aware of as you design and develop your data warehouse applications. > but I have very little patience with the > idea of burdening psql with yet another random feature in order to work > around deficiencies in somebody else's software. Considering that the > earliest any such thing could reach the field would be 9.4, it seems not > unlikely that the need for it would be gone by next year anyway. +1 this is really an amazon problem not a postgresql problem. Sincerely, Joshua D. Drake -- Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC @cmdpromptinc - 509-416-6579
<div class="moz-cite-prefix">On 04/25/2013 10:55 PM, Andrew Dunstan wrote:<br /></div><blockquote cite="mid:517943E6.4030402@dunslane.net"type="cite"><br /> It's an Amazon product based on release 8.0, but with many manyfeatures removed (e.g. Indexes!) </blockquote> More specifically, it's a hacked-up column-store-ized Pg for OLAP andanalytics work. As I understand it Amazon didn't develop it themselves; they bought/licensed Paraccel ( <a href="http://www.paraccel.com/">http://www.paraccel.com/</a>)and customised to integrated into Amazon's IAM authentication/authorization,usage accounting, dynamic provisioning, etc.<br /><br /><a href="http://docs.aws.amazon.com/redshift/latest/dg/c_redshift-and-postgres-sql.html">http://docs.aws.amazon.com/redshift/latest/dg/c_redshift-and-postgres-sql.html</a><br /><ahref="http://en.wikipedia.org/wiki/ParAccel">http://en.wikipedia.org/wiki/ParAccel</a><br /><br /> I find it frustratingthat I've never seen an @paraccel email address here and that few of the other vendors of highly customised Pgoffshoots are contributing back. It's almost enough to make me like the GPL.<br /><pre class="moz-signature" cols="72">--Craig Ringer <a class="moz-txt-link-freetext" href="http://www.2ndQuadrant.com/">http://www.2ndQuadrant.com/</a>PostgreSQLDevelopment, 24x7 Support, Training & Services</pre>
On 4/28/13 7:50 AM, Craig Ringer wrote: > I find it frustrating that I've never seen an @paraccel email address here and that few of the other vendors of highlycustomised Pg offshoots are contributing back. It's almost enough to make me like the GPL. FWIW, I think there's a pretty large barrier to these folks contributing back. Would the community really want to add a bunchof hooks to support something like Redshift? Or Greenplum? Or etc, etc.? Most of these guys have to change significantamounts of PG code, so much so that it's actually hard for them to stay current (which is why most of them justfork). I do think this is a shame, but I'm not sure of any good way to fix it. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Mon, Apr 29, 2013 at 4:33 PM, Jim Nasby <jim@nasby.net> wrote: > On 4/28/13 7:50 AM, Craig Ringer wrote: >> >> I find it frustrating that I've never seen an @paraccel email address here >> and that few of the other vendors of highly customised Pg offshoots are >> contributing back. It's almost enough to make me like the GPL. > > FWIW, I think there's a pretty large barrier to these folks contributing > back. Would the community really want to add a bunch of hooks to support > something like Redshift? Or Greenplum? Or etc, etc.? Most of these guys have > to change significant amounts of PG code, so much so that it's actually hard > for them to stay current (which is why most of them just fork). > > I do think this is a shame, but I'm not sure of any good way to fix it. Yep. There are plenty of things that we do at EDB for good and valid business reasons that I can't imagine the community accepting under any circumstances. For example, Oracle compatibility is not something the community values as highly as EnterpriseDB (and our customers) do.I'm sure that many of those vendors are in similarsituations - they write code that only runs on specialized hardware, or (rather commonly, I suspect) they remove parts of the functionality in order to make certain things very fast. Those are not trade-offs that make sense for PostgreSQL, but I find it hard to understand what we'd gain from preventing other people from making them. There are in fact a pretty large number of companies - EnterpriseDB, obviously, but there are many, many others - that are choosing to build businesses around PostgreSQL precisely because it *isn't* GPL. Personally, I think that's a good thing for our community in terms of mindshare even when companies choose not to contribute back - and it's even better when they do. I was at the MySQL show a couple of years back and they had a vendor area. And I got talking to one the vendors there who had implemented some new kind of database - I forget the details exactly - and he told me that it spoke the PostgreSQL wire protocol. I was of course a bit surprised as PostgreSQL was not exactly what people at this show were excited about. So I asked him why not the MySQL wire protocol, and he, basically, that they were afraid of being accused of a GPL violation, because MySQL AB had a very expansive interpretation of what the GPL did and did not allow. We could perhaps argue about whether he was right to be worried about that... but he was. I can't think of all the names right now, but I've talked with a bunch of other companies over the last few years who also chose PostgreSQL for licensing reasons. I'm pretty happy about that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> I find it frustrating that I've never seen an @paraccel email address > here and that few of the other vendors of highly customised Pg offshoots > are contributing back. It's almost enough to make me like the GPL. Well, Paraccel never ended up contributing any code, but in years back (2006-2008) they did contribute money to PostgreSQL (event sponsorships, SPI). I actually talked to them about contributing code, but somehow it never happened. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 05/02/2013 08:36 AM, Robert Haas wrote: > I've talked with a bunch of other companies over the last few years > who also chose PostgreSQL for licensing reasons. I'm pretty happy > about that. I think it's a pretty good thing too, personally ... but I do wish they'd contribute a bit more to the client-side drivers like psqlODBC and PgJDBC. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Robert Haas <robertmhaas@gmail.com> writes: > ... There are in fact a > pretty large number of companies - EnterpriseDB, obviously, but there > are many, many others - that are choosing to build businesses around > PostgreSQL precisely because it *isn't* GPL. Personally, I think > that's a good thing for our community in terms of mindshare even when > companies choose not to contribute back - and it's even better when > they do. Yeah, exactly. People forking PG may not contribute much directly, but they still add to mindshare and general familiarity with the PG code base. In the long run, that cannot help but add to our goal of world domination ;-) We had the GPL-vs-BSD-license discussion in excruciating length years ago, mid-2000 if memory serves, so you can find a whole lot more about this topic in the mailing list archives from back then. But personally I wouldn't consider changing even if it were somehow practical. The license choice has served us well and will continue to do so. regards, tom lane
On 5/1/13 7:36 PM, Robert Haas wrote: > On Mon, Apr 29, 2013 at 4:33 PM, Jim Nasby<jim@nasby.net> wrote: >> >On 4/28/13 7:50 AM, Craig Ringer wrote: >>> >> >>> >>I find it frustrating that I've never seen an @paraccel email address here >>> >>and that few of the other vendors of highly customised Pg offshoots are >>> >>contributing back. It's almost enough to make me like the GPL. >> > >> >FWIW, I think there's a pretty large barrier to these folks contributing >> >back. Would the community really want to add a bunch of hooks to support >> >something like Redshift? Or Greenplum? Or etc, etc.? Most of these guys have >> >to change significant amounts of PG code, so much so that it's actually hard >> >for them to stay current (which is why most of them just fork). >> > >> >I do think this is a shame, but I'm not sure of any good way to fix it. > Yep. There are plenty of things that we do at EDB for good and valid > business reasons that I can't imagine the community accepting under > any circumstances. For example, Oracle compatibility is not something > the community values as highly as EnterpriseDB (and our customers) do. > I'm sure that many of those vendors are in similar situations - they > write code that only runs on specialized hardware, or (rather > commonly, I suspect) they remove parts of the functionality in order > to make certain things very fast. Those are not trade-offs that make > sense for PostgreSQL, but I find it hard to understand what we'd gain > from preventing other people from making them. There are in fact a > pretty large number of companies - EnterpriseDB, obviously, but there > are many, many others - that are choosing to build businesses around > PostgreSQL precisely because it*isn't* GPL. Personally, I think > that's a good thing for our community in terms of mindshare even when > companies choose not to contribute back - and it's even better when > they do. FWIW, one point I was trying to make that was overlooked is that it seems to be exceptionally difficult for companies tofork Postgres and then stay current (AFAIK EnterpriseDB and Mammoth are the only products that have pulled that feat off).I believe that makes it significantly harder for them to actually contribute code back that doesn't give them a businessadvantage, as well as making it harder to justify hacking on the community codebase because they'll just face a verylarge hurdle when it comes to pulling that code back into their proprietary product. I don't know of any good way to solve that problem. Maybe it's not worth solving... but I do suspect there's some usefulstuff that the community has lost out on because of this. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Wed, May 08, 2013 at 06:08:28PM -0500, Jim Nasby wrote: > On 5/1/13 7:36 PM, Robert Haas wrote: > >On Mon, Apr 29, 2013 at 4:33 PM, Jim Nasby<jim@nasby.net> wrote: > >>>On 4/28/13 7:50 AM, Craig Ringer wrote: > >>>>> > >>>>>I find it frustrating that I've never seen an @paraccel email address here > >>>>>and that few of the other vendors of highly customised Pg offshoots are > >>>>>contributing back. It's almost enough to make me like the GPL. > >>> > >>>FWIW, I think there's a pretty large barrier to these folks contributing > >>>back. Would the community really want to add a bunch of hooks to support > >>>something like Redshift? Or Greenplum? Or etc, etc.? Most of these guys have > >>>to change significant amounts of PG code, so much so that it's actually hard > >>>for them to stay current (which is why most of them just fork). > >>> > >>>I do think this is a shame, but I'm not sure of any good way to fix it. > >Yep. There are plenty of things that we do at EDB for good and valid > >business reasons that I can't imagine the community accepting under > >any circumstances. For example, Oracle compatibility is not something > >the community values as highly as EnterpriseDB (and our customers) do. > > I'm sure that many of those vendors are in similar situations - they > >write code that only runs on specialized hardware, or (rather > >commonly, I suspect) they remove parts of the functionality in order > >to make certain things very fast. Those are not trade-offs that make > >sense for PostgreSQL, but I find it hard to understand what we'd gain > >from preventing other people from making them. There are in fact a > >pretty large number of companies - EnterpriseDB, obviously, but there > >are many, many others - that are choosing to build businesses around > >PostgreSQL precisely because it*isn't* GPL. Personally, I think > >that's a good thing for our community in terms of mindshare even when > >companies choose not to contribute back - and it's even better when > >they do. > > FWIW, one point I was trying to make that was overlooked is that it > seems to be exceptionally difficult for companies to fork Postgres > and then stay current (AFAIK EnterpriseDB and Mammoth are the only > products that have pulled that feat off). VMware and CitusDB are doing pretty well so far, but it's early days. > I believe that makes it significantly harder for them to actually > contribute code back that doesn't give them a business advantage, as > well as making it harder to justify hacking on the community > codebase because they'll just face a very large hurdle when it comes > to pulling that code back into their proprietary product. > > I don't know of any good way to solve that problem. Maybe it's not > worth solving... but I do suspect there's some useful stuff that the > community has lost out on because of this. Some of this is getting solved by making PostgreSQL more pluggable in ways that isolate the proprietary stuff, i.e. make people not have to touch the PostgreSQL core code much, if at all, in order to provide whatever special features they provide. Hooks and FDWs are two such pluggable components. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Thu, May 9, 2013 at 9:12 AM, David Fetter <david@fetter.org> wrote:
-- Some of this is getting solved by making PostgreSQL more pluggable inOn Wed, May 08, 2013 at 06:08:28PM -0500, Jim Nasby wrote:
> I believe that makes it significantly harder for them to actually
> contribute code back that doesn't give them a business advantage, as
> well as making it harder to justify hacking on the community
> codebase because they'll just face a very large hurdle when it comes
> to pulling that code back into their proprietary product.
>
> I don't know of any good way to solve that problem. Maybe it's not
> worth solving... but I do suspect there's some useful stuff that the
> community has lost out on because of this.
ways that isolate the proprietary stuff, i.e. make people not have to
touch the PostgreSQL core code much, if at all, in order to provide
whatever special features they provide. Hooks and FDWs are two such
pluggable components.
Extensions and the lovely background workers as well.
Michael
On Thu, May 9, 2013 at 7:36 AM, Michael Paquier <michael.paquier@gmail.com> wrote: >> Some of this is getting solved by making PostgreSQL more pluggable in >> ways that isolate the proprietary stuff, i.e. make people not have to >> touch the PostgreSQL core code much, if at all, in order to provide >> whatever special features they provide. Hooks and FDWs are two such >> pluggable components. > > Extensions and the lovely background workers as well. Some kind of extendable parser would be awesome. It would need to tie into the rewriter also. No, I don't have a clue what the design looks like. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, May 11, 2013 at 11:17:03AM -0400, Robert Haas wrote: > On Thu, May 9, 2013 at 7:36 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: > > >> Some of this is getting solved by making PostgreSQL more pluggable in > >> ways that isolate the proprietary stuff, i.e. make people not have to > >> touch the PostgreSQL core code much, if at all, in order to provide > >> whatever special features they provide. Hooks and FDWs are two such > >> pluggable components. > > > > Extensions and the lovely background workers as well. > > Some kind of extendable parser would be awesome. It would need to tie > into the rewriter also. > > No, I don't have a clue what the design looks like. That's a direction several of the proprietary RDBMS vendors have proposed. I think it'd be great :) Pre-coffee (yeah, I know. Bad idea.) sketch of an idea: create an API to pass expression trees in and out. This could have other benefits as to clustering space, shortening the planning cycle, etc., but let's not go there for now. My knowledge is very, very sketchy, but when I squint, the expression trees we use look a lot like JSON. Are they isomorphic? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes: > On Sat, May 11, 2013 at 11:17:03AM -0400, Robert Haas wrote: >> Some kind of extendable parser would be awesome. It would need to tie >> into the rewriter also. >> >> No, I don't have a clue what the design looks like. > That's a direction several of the proprietary RDBMS vendors have > proposed. I think it'd be great :) > Pre-coffee (yeah, I know. Bad idea.) sketch of an idea: create an API > to pass expression trees in and out. This could have other benefits > as to clustering space, shortening the planning cycle, etc., but let's > not go there for now. My knowledge is very, very sketchy, but when I > squint, the expression trees we use look a lot like JSON. Are they > isomorphic? By the time you've got an expression tree, the problem is mostly solved, at least so far as parser extension is concerned. More years ago than I care to admit, I worked on systems that had run-time-extensible parsers at Hewlett-Packard, so technology for this does exist. But my (vague) memory of those systems is that the parser's language capabilities were more limited than bison's, perhaps only LL(1). Parsing spec-compatible SQL that way might be a challenge. A larger issue is that if you don't have the whole grammar available to check, it's difficult to be sure there are no parsing conflicts. I seem to remember that we hit some conflicts between different extension ROMs back at HP :-( Another point is that extensions that are actually interesting require a lot more than new syntax. Robert mentioned the rewriter, but you'd typically need planner and executor additions as well. It's possible to see how whole new plan node types might be added by a plugin so far as the executor is concerned, but I haven't a clue how we'd get the planner to emit them ... regards, tom lane
On Sat, May 11, 2013 at 12:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > By the time you've got an expression tree, the problem is mostly solved, > at least so far as parser extension is concerned. Right. > More years ago than I care to admit, I worked on systems that had > run-time-extensible parsers at Hewlett-Packard, so technology for this > does exist. But my (vague) memory of those systems is that the parser's > language capabilities were more limited than bison's, perhaps only > LL(1). Parsing spec-compatible SQL that way might be a challenge. If I understand bison correctly, it basically looks at the current parser state and the next token and decides to either shift that token onto the stack or reduce the stack using some rule. If there's no matching rule, we error out. If someone wants to inject new rules into the grammar, those state tables are all going to change. But if we could contrive things so that the state tables are built dynamically and can be change as rules are added and removed, then it seems to me that we could let a loadable module supply (or delete) whatever grammar rules it likes. Whenever it does this, we recompile the grammar on next use (and complain if we get ambiguities). This does not sound all that easy to code, but at least in theory it seems doable. We'd also need a way to add keywords. To be quite frank, I think our whole approach to keywords is massive screwed up right now. Aside from the parser bloat, adding even unreserved keywords breaks stuff, e.g. "SELECT 1 this_is_not_yet_a_keyword". EnterpriseDB's internal bug-tracking system must have at least half a dozen bug reports open right now that are attributable to keywords either existing at all or being more reserved than they are in Oracle. That aside, the amount of trouble we've caused for PostgreSQL users over the years, either by adding new keywords or disallowing them in contexts where they used to work, is not small. I don't have a specific proposal for what we should do to make this problem less painful, but I think if we ever revise our parser infrastructure we ought to use that as an opportunity to try to come up with something better than what exists today, because the status quo is awfully painful. > Another point is that extensions that are actually interesting require > a lot more than new syntax. Robert mentioned the rewriter, but you'd > typically need planner and executor additions as well. It's possible to > see how whole new plan node types might be added by a plugin so far as > the executor is concerned, but I haven't a clue how we'd get the planner > to emit them ... I don't have any ideas either. Still, solving some of the problems would be better than solving none of them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 5/11/13 11:27 AM, Tom Lane wrote: > David Fetter <david@fetter.org> writes: >> On Sat, May 11, 2013 at 11:17:03AM -0400, Robert Haas wrote: >>> Some kind of extendable parser would be awesome. It would need to tie >>> into the rewriter also. >>> >>> No, I don't have a clue what the design looks like. > >> That's a direction several of the proprietary RDBMS vendors have >> proposed. I think it'd be great :) > >> Pre-coffee (yeah, I know. Bad idea.) sketch of an idea: create an API >> to pass expression trees in and out. This could have other benefits >> as to clustering space, shortening the planning cycle, etc., but let's >> not go there for now. My knowledge is very, very sketchy, but when I >> squint, the expression trees we use look a lot like JSON. Are they >> isomorphic? > > By the time you've got an expression tree, the problem is mostly solved, > at least so far as parser extension is concerned. > > More years ago than I care to admit, I worked on systems that had > run-time-extensible parsers at Hewlett-Packard, so technology for this > does exist. But my (vague) memory of those systems is that the parser's > language capabilities were more limited than bison's, perhaps only > LL(1). Parsing spec-compatible SQL that way might be a challenge. > > A larger issue is that if you don't have the whole grammar available > to check, it's difficult to be sure there are no parsing conflicts. > I seem to remember that we hit some conflicts between different > extension ROMs back at HP :-( > > Another point is that extensions that are actually interesting require > a lot more than new syntax. Robert mentioned the rewriter, but you'd > typically need planner and executor additions as well. It's possible to > see how whole new plan node types might be added by a plugin so far as > the executor is concerned, but I haven't a clue how we'd get the planner > to emit them ... FWIW, I've wanted the ability to plug into the parser not for an extension, but so that I could programmaticly enforce certaincoding conventions. TBH, it would be ideal if that was possible without requiring a backend (ie, some kind of stand-alonelibrary/tool), but exposing it via a backend would be better than nothing. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Sun, May 12, 2013 at 1:18 PM, Jim Nasby <jim@nasby.net> wrote: > FWIW, I've wanted the ability to plug into the parser not for an extension, > but so that I could programmaticly enforce certain coding conventions. Depending on the exact requirements, that probably wouldn't be too difficult. It'd likely entail adding a hook for plugins to inspect raw parse trees. The pg_stat_statements core infrastructure changes for 9.2 added a hook that allowed pg_stat_statements to inspect the post-parse analysis (query) tree. That wouldn't be suitable for what you want, though, because that tree loses many of the details that you are likely to care about but are inconsequential to the semantics of the query. For example, you couldn't tell from the post-parse analysis tree if the query text was originally "select all, the, columns from table" or "select * from table". -- Peter Geoghegan
On Sunday, May 12, 2013 4:53 AM Robert Haas wrote: > On Sat, May 11, 2013 at 12:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > By the time you've got an expression tree, the problem is mostly > solved, > > at least so far as parser extension is concerned. > > Right. > > > More years ago than I care to admit, I worked on systems that had > > run-time-extensible parsers at Hewlett-Packard, so technology for > this > > does exist. But my (vague) memory of those systems is that the > parser's > > language capabilities were more limited than bison's, perhaps only > > LL(1). Parsing spec-compatible SQL that way might be a challenge. > > If I understand bison correctly, it basically looks at the current > parser state and the next token and decides to either shift that token > onto the stack or reduce the stack using some rule. If there's no > matching rule, we error out. If someone wants to inject new rules > into the grammar, those state tables are all going to change. But if > we could contrive things so that the state tables are built > dynamically and can be change as rules are added and removed, then it > seems to me that we could let a loadable module supply (or delete) > whatever grammar rules it likes. Whenever it does this, we recompile > the grammar on next use (and complain if we get ambiguities). This > does not sound all that easy to code, but at least in theory it seems > doable. Can we think of a pluggable parser whose responsibility is to produce raw parse tree structure similar to PostgreSQL, but internally it can handle different database syntax (e.g Oracle). For example below syntaxes of Oracle can be converted to PostgreSQL raw parse tree Oracle syntax: SELECT 1 + 10 * 100 + 1000 from DUAL; PostgreSQL syntax: SELECT 1 + 10 * 100 + 1000; Oracle syntax: CREATE USER xxxx IDENTIFIED BY 'password'; ALTER USER xxxx IDENTIFiED BY 'password'; PostgreSQL syntax: CREATE USER xxxx PASSWORD 'password'; ALTER USER xxxx PASSWORD 'password'; Currently if somebody has to migrate from Sybase, Oracle, SQL Server, MySQL to PostgreSQL, he has to either use some converter tool or enhance existing gram.y to support it. In case current PG parser is pluggable, he can write it's parser which can produce PG specific parse tree for another database syntax. User can load database specific Parser during server start time by specifying corresponding parser dll/so in shared_preload_libraries. Advantage of writing PG based parser for other database syntax is user can directly use script files of other database, no need of conversion every time. Though I am not sure, but I think 70~80% of commonly used SQL syntax is common among different databases as all standard databases are SQL-spec compliant, so if such a parser facility is provided, it can save lot of time taken for conversion for customers who want to migrate from other database syntax to PostgreSQL. With Regards, Amit Kapila.
On Sat, May 11, 2013 at 07:22:55PM -0400, Robert Haas wrote: > On Sat, May 11, 2013 at 12:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > By the time you've got an expression tree, the problem is mostly solved, > > at least so far as parser extension is concerned. > > Right. > > > More years ago than I care to admit, I worked on systems that had > > run-time-extensible parsers at Hewlett-Packard, so technology for this > > does exist. But my (vague) memory of those systems is that the parser's > > language capabilities were more limited than bison's, perhaps only > > LL(1). Parsing spec-compatible SQL that way might be a challenge. > > If I understand bison correctly, it basically looks at the current > parser state and the next token and decides to either shift that token > onto the stack or reduce the stack using some rule. If there's no > matching rule, we error out. If someone wants to inject new rules > into the grammar, those state tables are all going to change. But if > we could contrive things so that the state tables are built > dynamically and can be change as rules are added and removed, then it > seems to me that we could let a loadable module supply (or delete) > whatever grammar rules it likes. Whenever it does this, we recompile > the grammar on next use (and complain if we get ambiguities). This > does not sound all that easy to code, but at least in theory it seems > doable. > > We'd also need a way to add keywords. To be quite frank, I think our > whole approach to keywords is massive screwed up right now. Aside FYI, one trick that is often used when parsing C code is to allow parsed code to add to the keyword list, so when the parser goes to get the next token, it might get "keyword" back based on what it has seen already. This is required to handle C typedefs, and might be a method we could use. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
I have actually been working on the task discussed in this thread, most relevant parts quoted below, for awhile now, and hope to have something concrete that you can use by the end of this summer. My in-development Muldis D language is homoiconic as a core feature, its source code is data to it, and the native syntax of the language resembles an abstract syntax tree. This tree of nodes primarily defines behavior of the code, but it also supports arbitrary metadata per node, which for example can be used to preserve concrete syntax for any programming language that can be parsed into Muldis D nodes or conversely generated from said. For example, you can have one type of node defining a function, and its details are defined by its attributes or child nodes, such as its result type, its parameters, whether it is declared associative/commutative/etc or not, and the expression(s) defining its body. Another type of node defines a call to a function, another type defines a text literal, another a relation literal, and so on. Conversely, a node can define a schema or package etc. Example metadata, which is also structured, could include line numbers or code comments or whitespace or exact numeric literal formats or quoting formats or exact keyword choices, for example. Using these node data types, we have a fairly normalized representation of any source code (or data) that is easy to introspect or transform with code. A key design of Muldis D is the clear separation of syntax and behavior. A parser is just a function that takes (typically) a character string as input and produces a node (tree) as output. A compiler is just a function or operation that takes a node (tree) as input and produces machine code. An optimizer can be a function that derives one node tree from another, either as its own operation or typically as part of the compiler stage. A compiler or optimizer generally can trivially ignore the node metadata, but a code generator or debugger can use it; metadata can be stripped without affecting behavior. The canonical tree form can also easily be mapped losslessly with relation forms, such as typical information schemas have. Practically all behavior is defined in terms of generic type and routine definitions. Practically all system-defined types and routines are defined in one or more libraries/modules/packages that have the same format as those users would write like extensions. So, all the relational operators have the same syntax as say string or numeric or array operators. I envision that the most effective way to use Muldis D to handle an arbitrary programming language, including the native SQL syntax+behavior of each SQL DBMS, is to have a Muldis D library+parser pair for it. For example, to represent PostgreSQL 9.2 most directly, we have a library with an explicitly defined type and routine for every built-in that Pg 9.2 has, and we also have a parser function that takes the SQL syntax that Pg 9.2 understands and produces a Muldis D node tree consisting of calls to the routines of that library or value selectors of types in that library (things like SELECT and INSERT etc are each mapped to a routine too). That way, even with a standard node format that isn't specific to a typical language or version, the code for parsing Pg 9.2 SQL has the minimal amount of indirection that it has to deal with, as each syntax element would have a direct library call counterpart. Similarly, the existing Pg 9.2 SQL compiler would have the least indirection to take said nodes and execute them. (The library would be named eg "Postgres_9_2" for example, which makes it easier say to also have one side-by-side for other versions, shims, legacy code you want to more easily support compatibility with.) Where one decides to do cross-compilation, say make Oracle SQL run on Pg, that could be done as simply as defining a library for Oracle SQL with the routines+types that has, and then mapping it to a Pg one just in terms of shim calls (which the compiler can optimize away as applicable), and so parsers or compilers never necessarily have to deal with behavior compatibility issues. I am presently working out the exact set of such language nodes, and making a reference implementation which is mostly self-defined and would compile to Perl code, and hope to have the first Muldis D executable by the end of this summer. I am confident that an adaption of this design into C or whatever would serve Postgres greatly in letting it effectively parse multiple languages, anywhere from application programming languages to multiple SQL dialects or versions. Even if you don't decide to use my design specifically (which is open source and you can influence it), I think you should find some of the general design principles I stated above to be useful. Representing behavior as libraries the AST being flexible enough for any concrete language without being too specific to details of one. And of course, cross-invocation of code written in multiple languages is made much easier. Note, just to be clear, my proposal does not necessitate that all of a node tree has to be kept in memory at once. This design should be adaptable to a streaming approach, especially as it is expected to be able to handle database dumps or transfers of arbitrary size, same as SQL engines can today. That is in contrast to probably what most application languages would assume, where everything would fit in memory at once. But the ability to stream or not would largely be an implementation detail. Realistically, all code should fit in memory at once, and anything that would have to be buffered out of memory would be say embedded data literals (whether large strings or simply large relations). If you're not sure how my proposal would address any of the needs or wants raised in the thread, go ahead and ask, and I will try and answer as time permits. -- Darren Duncan On 2013.05.11 9:27 AM, Tom Lane wrote: > David Fetter <david@fetter.org> writes: >> On Sat, May 11, 2013 at 11:17:03AM -0400, Robert Haas wrote: >>> Some kind of extendable parser would be awesome. It would need to tie >>> into the rewriter also. >>> >>> No, I don't have a clue what the design looks like. > >> That's a direction several of the proprietary RDBMS vendors have >> proposed. I think it'd be great :) > >> Pre-coffee (yeah, I know. Bad idea.) sketch of an idea: create an API >> to pass expression trees in and out. This could have other benefits >> as to clustering space, shortening the planning cycle, etc., but let's >> not go there for now. My knowledge is very, very sketchy, but when I >> squint, the expression trees we use look a lot like JSON. Are they >> isomorphic? > > By the time you've got an expression tree, the problem is mostly solved, > at least so far as parser extension is concerned. > > More years ago than I care to admit, I worked on systems that had > run-time-extensible parsers at Hewlett-Packard, so technology for this > does exist. But my (vague) memory of those systems is that the parser's > language capabilities were more limited than bison's, perhaps only > LL(1). Parsing spec-compatible SQL that way might be a challenge. > > A larger issue is that if you don't have the whole grammar available > to check, it's difficult to be sure there are no parsing conflicts. > I seem to remember that we hit some conflicts between different > extension ROMs back at HP :-( > > Another point is that extensions that are actually interesting require > a lot more than new syntax. Robert mentioned the rewriter, but you'd > typically need planner and executor additions as well. It's possible to > see how whole new plan node types might be added by a plugin so far as > the executor is concerned, but I haven't a clue how we'd get the planner > to emit them ...