Thread: Proposal to add --single-row to psql

Proposal to add --single-row to psql

From
Christopher Manning
Date:
psql currently collects the query result rows in memory before writing them to a file and can cause out of memory problems for large results in low memory environments like ec2. I can't use COPY TO STDOUT or FETCH_COUNT since I'm using Redshift and it doesn't support [writing to STDOUT](http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html) or [CURSOR](https://forums.aws.amazon.com/thread.jspa?threadID=122664).

[Single Row Mode is available in Postgres 9.2](http://www.postgresql.org/docs/9.2/static/libpq-single-row-mode.html) but [it doesn't look like](http://www.postgresql.org/docs/9.2/static/app-psql.html) you can tell psql to use single row mode when writing to a file (using --output).

I'm proposing to add a --single-row option to psql that would allow the result rows of a query to be streamed to a file without collecting them in memory first. 

I'm new to the postgres source, but I was considering doing this by adding an elseif at [this line in bin/psql/common.c](https://github.com/postgres/postgres/blob/master/src/bin/psql/common.c#L955) that would call [PQsetSingleRowMode](https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/fe-exec.c#L1581) and ideally use something very similar to [ExecQueryUsingCursor](https://github.com/postgres/postgres/blob/master/src/bin/psql/common.c#L1081)

Please let me know if that would be an acceptable addition and if there's anything in particular I should be aware of when adding the feature.

Thank you,
Christopher

Re: Proposal to add --single-row to psql

From
Pavel Stehule
Date:
Hello

It is redundant to current FETCH_COUNT implementation, so I don't see sense to use it together. Maybe we can drop FETCH_COUNT and replace it by --single-row mode and probably it can simplify code.

Regards

Pavel



2013/4/23 Christopher Manning <c@christophermanning.org>
psql currently collects the query result rows in memory before writing them to a file and can cause out of memory problems for large results in low memory environments like ec2. I can't use COPY TO STDOUT or FETCH_COUNT since I'm using Redshift and it doesn't support [writing to STDOUT](http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html) or [CURSOR](https://forums.aws.amazon.com/thread.jspa?threadID=122664).

[Single Row Mode is available in Postgres 9.2](http://www.postgresql.org/docs/9.2/static/libpq-single-row-mode.html) but [it doesn't look like](http://www.postgresql.org/docs/9.2/static/app-psql.html) you can tell psql to use single row mode when writing to a file (using --output).

I'm proposing to add a --single-row option to psql that would allow the result rows of a query to be streamed to a file without collecting them in memory first. 

I'm new to the postgres source, but I was considering doing this by adding an elseif at [this line in bin/psql/common.c](https://github.com/postgres/postgres/blob/master/src/bin/psql/common.c#L955) that would call [PQsetSingleRowMode](https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/fe-exec.c#L1581) and ideally use something very similar to [ExecQueryUsingCursor](https://github.com/postgres/postgres/blob/master/src/bin/psql/common.c#L1081)

Please let me know if that would be an acceptable addition and if there's anything in particular I should be aware of when adding the feature.

Thank you,
Christopher


Re: Proposal to add --single-row to psql

From
Tom Lane
Date:
Christopher Manning <c@christophermanning.org> writes:
> I'm proposing to add a --single-row option to psql that would allow the
> result rows of a query to be streamed to a file without collecting them in
> memory first.

Isn't there already a way to set FETCH_COUNT from the command line?
(ie, I think there's a generic variable-assignment facility that could
do this)
        regards, tom lane



Re: Proposal to add --single-row to psql

From
Fabrízio de Royes Mello
Date:
<div dir="ltr"><br />On Tue, Apr 23, 2013 at 1:05 PM, Tom Lane <<a
href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>wrote:<br />><br />> Isn't there already a way to set
FETCH_COUNTfrom the command line?<br /> > (ie, I think there's a generic variable-assignment facility that could<br
/>>do this)<br />><br /><br />Christopher,<br /><br />Tom is all right... from psql [1] command line we can do
that:<br/><br />$ bin/psql --variable=FETCH_COUNT=100<br /> psql (9.3devel)<br />Type "help" for help.<br /><br
/>fabrizio=#\echo :FETCH_COUNT<br />100<br /><br />Regards,<br /><br />[1] <a
href="http://www.postgresql.org/docs/current/interactive/app-psql.html#AEN84903">http://www.postgresql.org/docs/current/interactive/app-psql.html#AEN84903</a><br
/><br/>-- <br />Fabrízio de Royes Mello<br />Consultoria/Coaching PostgreSQL<br />>> Blog sobre TI: <a
href="http://fabriziomello.blogspot.com">http://fabriziomello.blogspot.com</a><br/>>> Perfil Linkedin: <a
href="http://br.linkedin.com/in/fabriziomello">http://br.linkedin.com/in/fabriziomello</a><br/> >> Twitter: <a
href="http://twitter.com/fabriziomello">http://twitter.com/fabriziomello</a><br/></div> 

Re: Proposal to add --single-row to psql

From
Christopher Manning
Date:
Fabrízio and Tom,

I know that you can use  --variable="FETCH_COUNT=10000" from the psql command line, but internally that uses a CURSOR to batch the rows and [Redshift doesn't support CURSOR](https://forums.aws.amazon.com/thread.jspa?threadID=122664&tstart=0) so it's not an option when using psql to download data from Redshift.

Pavel's idea of having a single row mode option to replace FETCH_COUNT is interesting, does anyone have any problems with that or alternative ideas?

Regards,
Christopher



On Wed, Apr 24, 2013 at 7:04 AM, Fabrízio de Royes Mello <fabriziomello@gmail.com> wrote:

On Tue, Apr 23, 2013 at 1:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Isn't there already a way to set FETCH_COUNT from the command line?
> (ie, I think there's a generic variable-assignment facility that could
> do this)
>

Christopher,

Tom is all right... from psql [1] command line we can do that:

$ bin/psql --variable=FETCH_COUNT=100
psql (9.3devel)
Type "help" for help.

fabrizio=# \echo :FETCH_COUNT
100

Regards,

[1] http://www.postgresql.org/docs/current/interactive/app-psql.html#AEN84903

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello

Re: Proposal to add --single-row to psql

From
Tom Lane
Date:
Christopher Manning <c@christophermanning.org> writes:
> Fabr�zio and Tom,
> I know that you can use  --variable="FETCH_COUNT=10000" from the
> psql command line, but internally that uses a CURSOR to batch the rows and
> [Redshift doesn't support CURSOR](
> https://forums.aws.amazon.com/thread.jspa?threadID=122664&tstart=0) so it's
> not an option when using psql to download data from Redshift.

I don't know what redshift is, but I have very little patience with the
idea of burdening psql with yet another random feature in order to work
around deficiencies in somebody else's software.  Considering that the
earliest any such thing could reach the field would be 9.4, it seems not
unlikely that the need for it would be gone by next year anyway.
        regards, tom lane



Re: Proposal to add --single-row to psql

From
Andrew Dunstan
Date:
On 04/25/2013 10:42 AM, Tom Lane wrote:
> Christopher Manning <c@christophermanning.org> writes:
>> Fabrízio and Tom,
>> I know that you can use  --variable="FETCH_COUNT=10000" from the
>> psql command line, but internally that uses a CURSOR to batch the rows and
>> [Redshift doesn't support CURSOR](
>> https://forums.aws.amazon.com/thread.jspa?threadID=122664&tstart=0) so it's
>> not an option when using psql to download data from Redshift.
> I don't know what redshift is,


It's an Amazon product based on release 8.0, but with many many features
removed (e.g. Indexes!)

> but I have very little patience with the
> idea of burdening psql with yet another random feature in order to work
> around deficiencies in somebody else's software.  Considering that the
> earliest any such thing could reach the field would be 9.4, it seems not
> unlikely that the need for it would be gone by next year anyway.
>
>


Plus there is the fact that we have no way to test it against redshift
anyway.

It should be up to Amazon to produce a useful psql program that works
with redshift, not us. We have enough to do to support our own product.

If this is to be justified at all it needs to be without reference to
redshift.

cheers

andrew



Re: Proposal to add --single-row to psql

From
"Joshua D. Drake"
Date:
On 04/25/2013 07:42 AM, Tom Lane wrote:
> Christopher Manning <c@christophermanning.org> writes:
>> Fabr�zio and Tom,
>> I know that you can use  --variable="FETCH_COUNT=10000" from the
>> psql command line, but internally that uses a CURSOR to batch the rows and
>> [Redshift doesn't support CURSOR](
>> https://forums.aws.amazon.com/thread.jspa?threadID=122664&tstart=0) so it's
>> not an option when using psql to download data from Redshift.
>
> I don't know what redshift is,

It is a PostgreSQL fork based on ancient source code. From Amazon:

Amazon Redshift is based on PostgreSQL 8.0.2. Amazon Redshift and
PostgreSQL have a number of very important differences that you must be
aware of as you design and develop your data warehouse applications.

> but I have very little patience with the
> idea of burdening psql with yet another random feature in order to work
> around deficiencies in somebody else's software.  Considering that the
> earliest any such thing could reach the field would be 9.4, it seems not
> unlikely that the need for it would be gone by next year anyway.

+1 this is really an amazon problem not a postgresql problem.

Sincerely,

Joshua D. Drake






--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579



Re: Proposal to add --single-row to psql

From
Craig Ringer
Date:
<div class="moz-cite-prefix">On 04/25/2013 10:55 PM, Andrew Dunstan wrote:<br /></div><blockquote
cite="mid:517943E6.4030402@dunslane.net"type="cite"><br /> It's an Amazon product based on release 8.0, but with many
manyfeatures removed (e.g. Indexes!) </blockquote> More specifically, it's a hacked-up column-store-ized Pg for OLAP
andanalytics work. As I understand it Amazon didn't develop it themselves; they bought/licensed Paraccel ( <a
href="http://www.paraccel.com/">http://www.paraccel.com/</a>)and customised to integrated into Amazon's IAM
authentication/authorization,usage accounting, dynamic provisioning, etc.<br /><br /><a
href="http://docs.aws.amazon.com/redshift/latest/dg/c_redshift-and-postgres-sql.html">http://docs.aws.amazon.com/redshift/latest/dg/c_redshift-and-postgres-sql.html</a><br
/><ahref="http://en.wikipedia.org/wiki/ParAccel">http://en.wikipedia.org/wiki/ParAccel</a><br /><br /> I find it
frustratingthat I've never seen an @paraccel email address here and that few of the other vendors of highly customised
Pgoffshoots are contributing back. It's almost enough to make me like the GPL.<br /><pre class="moz-signature"
cols="72">--Craig Ringer                   <a class="moz-txt-link-freetext"
href="http://www.2ndQuadrant.com/">http://www.2ndQuadrant.com/</a>PostgreSQLDevelopment, 24x7 Support, Training &
Services</pre>

Re: Proposal to add --single-row to psql

From
Jim Nasby
Date:
On 4/28/13 7:50 AM, Craig Ringer wrote:
> I find it frustrating that I've never seen an @paraccel email address here and that few of the other vendors of
highlycustomised Pg offshoots are contributing back. It's almost enough to make me like the GPL.
 

FWIW, I think there's a pretty large barrier to these folks contributing back. Would the community really want to add a
bunchof hooks to support something like Redshift? Or Greenplum? Or etc, etc.? Most of these guys have to change
significantamounts of PG code, so much so that it's actually hard for them to stay current (which is why most of them
justfork).
 

I do think this is a shame, but I'm not sure of any good way to fix it.
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: Proposal to add --single-row to psql

From
Robert Haas
Date:
On Mon, Apr 29, 2013 at 4:33 PM, Jim Nasby <jim@nasby.net> wrote:
> On 4/28/13 7:50 AM, Craig Ringer wrote:
>>
>> I find it frustrating that I've never seen an @paraccel email address here
>> and that few of the other vendors of highly customised Pg offshoots are
>> contributing back. It's almost enough to make me like the GPL.
>
> FWIW, I think there's a pretty large barrier to these folks contributing
> back. Would the community really want to add a bunch of hooks to support
> something like Redshift? Or Greenplum? Or etc, etc.? Most of these guys have
> to change significant amounts of PG code, so much so that it's actually hard
> for them to stay current (which is why most of them just fork).
>
> I do think this is a shame, but I'm not sure of any good way to fix it.

Yep.  There are plenty of things that we do at EDB for good and valid
business reasons that I can't imagine the community accepting under
any circumstances.  For example, Oracle compatibility is not something
the community values as highly as EnterpriseDB (and our customers) do.I'm sure that many of those vendors are in
similarsituations - they
 
write code that only runs on specialized hardware, or (rather
commonly, I suspect) they remove parts of the functionality in order
to make certain things very fast.  Those are not trade-offs that make
sense for PostgreSQL, but I find it hard to understand what we'd gain
from preventing other people from making them.  There are in fact a
pretty large number of companies - EnterpriseDB, obviously, but there
are many, many others - that are choosing to build businesses around
PostgreSQL precisely because it *isn't* GPL.  Personally, I think
that's a good thing for our community in terms of mindshare even when
companies choose not to contribute back - and it's even better when
they do.

I was at the MySQL show a couple of years back and they had a vendor
area.  And I got talking to one the vendors there who had implemented
some new kind of database - I forget the details exactly - and he told
me that it spoke the PostgreSQL wire protocol.  I was of course a bit
surprised as PostgreSQL was not exactly what people at this show were
excited about.  So I asked him why not the MySQL wire protocol, and
he, basically, that they were afraid of being accused of a GPL
violation, because MySQL AB had a very expansive interpretation of
what the GPL did and did not allow.  We could perhaps argue about
whether he was right to be worried about that... but he was.  I can't
think of all the names right now, but I've talked with a bunch of
other companies over the last few years who also chose PostgreSQL for
licensing reasons.  I'm pretty happy about that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Proposal to add --single-row to psql

From
Josh Berkus
Date:
> I find it frustrating that I've never seen an @paraccel email address
> here and that few of the other vendors of highly customised Pg offshoots
> are contributing back. It's almost enough to make me like the GPL.

Well, Paraccel never ended up contributing any code, but in years back
(2006-2008) they did contribute money to PostgreSQL (event sponsorships,
SPI).  I actually talked to them about contributing code, but somehow it
never happened.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Proposal to add --single-row to psql

From
Craig Ringer
Date:
On 05/02/2013 08:36 AM, Robert Haas wrote:

> I've talked with a bunch of other companies over the last few years
> who also chose PostgreSQL for licensing reasons. I'm pretty happy
> about that. 

I think it's a pretty good thing too, personally ... but I do wish
they'd contribute a bit more to the client-side drivers like psqlODBC
and PgJDBC.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services




Re: Proposal to add --single-row to psql

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> ... There are in fact a
> pretty large number of companies - EnterpriseDB, obviously, but there
> are many, many others - that are choosing to build businesses around
> PostgreSQL precisely because it *isn't* GPL.  Personally, I think
> that's a good thing for our community in terms of mindshare even when
> companies choose not to contribute back - and it's even better when
> they do.

Yeah, exactly.  People forking PG may not contribute much directly,
but they still add to mindshare and general familiarity with the PG
code base.  In the long run, that cannot help but add to our goal of
world domination ;-)

We had the GPL-vs-BSD-license discussion in excruciating length years
ago, mid-2000 if memory serves, so you can find a whole lot more about
this topic in the mailing list archives from back then.  But personally
I wouldn't consider changing even if it were somehow practical.  The
license choice has served us well and will continue to do so.
        regards, tom lane



Re: Proposal to add --single-row to psql

From
Jim Nasby
Date:
On 5/1/13 7:36 PM, Robert Haas wrote:
> On Mon, Apr 29, 2013 at 4:33 PM, Jim Nasby<jim@nasby.net>  wrote:
>> >On 4/28/13 7:50 AM, Craig Ringer wrote:
>>> >>
>>> >>I find it frustrating that I've never seen an @paraccel email address here
>>> >>and that few of the other vendors of highly customised Pg offshoots are
>>> >>contributing back. It's almost enough to make me like the GPL.
>> >
>> >FWIW, I think there's a pretty large barrier to these folks contributing
>> >back. Would the community really want to add a bunch of hooks to support
>> >something like Redshift? Or Greenplum? Or etc, etc.? Most of these guys have
>> >to change significant amounts of PG code, so much so that it's actually hard
>> >for them to stay current (which is why most of them just fork).
>> >
>> >I do think this is a shame, but I'm not sure of any good way to fix it.
> Yep.  There are plenty of things that we do at EDB for good and valid
> business reasons that I can't imagine the community accepting under
> any circumstances.  For example, Oracle compatibility is not something
> the community values as highly as EnterpriseDB (and our customers) do.
>   I'm sure that many of those vendors are in similar situations - they
> write code that only runs on specialized hardware, or (rather
> commonly, I suspect) they remove parts of the functionality in order
> to make certain things very fast.  Those are not trade-offs that make
> sense for PostgreSQL, but I find it hard to understand what we'd gain
> from preventing other people from making them.  There are in fact a
> pretty large number of companies - EnterpriseDB, obviously, but there
> are many, many others - that are choosing to build businesses around
> PostgreSQL precisely because it*isn't*  GPL.  Personally, I think
> that's a good thing for our community in terms of mindshare even when
> companies choose not to contribute back - and it's even better when
> they do.

FWIW, one point I was trying to make that was overlooked is that it seems to be exceptionally difficult for companies
tofork Postgres and then stay current (AFAIK EnterpriseDB and Mammoth are the only products that have pulled that feat
off).I believe that makes it significantly harder for them to actually contribute code back that doesn't give them a
businessadvantage, as well as making it harder to justify hacking on the community codebase because they'll just face a
verylarge hurdle when it comes to pulling that code back into their proprietary product.
 

I don't know of any good way to solve that problem. Maybe it's not worth solving... but I do suspect there's some
usefulstuff that the community has lost out on because of this.
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: Proposal to add --single-row to psql

From
David Fetter
Date:
On Wed, May 08, 2013 at 06:08:28PM -0500, Jim Nasby wrote:
> On 5/1/13 7:36 PM, Robert Haas wrote:
> >On Mon, Apr 29, 2013 at 4:33 PM, Jim Nasby<jim@nasby.net>  wrote:
> >>>On 4/28/13 7:50 AM, Craig Ringer wrote:
> >>>>>
> >>>>>I find it frustrating that I've never seen an @paraccel email address here
> >>>>>and that few of the other vendors of highly customised Pg offshoots are
> >>>>>contributing back. It's almost enough to make me like the GPL.
> >>>
> >>>FWIW, I think there's a pretty large barrier to these folks contributing
> >>>back. Would the community really want to add a bunch of hooks to support
> >>>something like Redshift? Or Greenplum? Or etc, etc.? Most of these guys have
> >>>to change significant amounts of PG code, so much so that it's actually hard
> >>>for them to stay current (which is why most of them just fork).
> >>>
> >>>I do think this is a shame, but I'm not sure of any good way to fix it.
> >Yep.  There are plenty of things that we do at EDB for good and valid
> >business reasons that I can't imagine the community accepting under
> >any circumstances.  For example, Oracle compatibility is not something
> >the community values as highly as EnterpriseDB (and our customers) do.
> >  I'm sure that many of those vendors are in similar situations - they
> >write code that only runs on specialized hardware, or (rather
> >commonly, I suspect) they remove parts of the functionality in order
> >to make certain things very fast.  Those are not trade-offs that make
> >sense for PostgreSQL, but I find it hard to understand what we'd gain
> >from preventing other people from making them.  There are in fact a
> >pretty large number of companies - EnterpriseDB, obviously, but there
> >are many, many others - that are choosing to build businesses around
> >PostgreSQL precisely because it*isn't*  GPL.  Personally, I think
> >that's a good thing for our community in terms of mindshare even when
> >companies choose not to contribute back - and it's even better when
> >they do.
> 
> FWIW, one point I was trying to make that was overlooked is that it
> seems to be exceptionally difficult for companies to fork Postgres
> and then stay current (AFAIK EnterpriseDB and Mammoth are the only
> products that have pulled that feat off).

VMware and CitusDB are doing pretty well so far, but it's early days.

> I believe that makes it significantly harder for them to actually
> contribute code back that doesn't give them a business advantage, as
> well as making it harder to justify hacking on the community
> codebase because they'll just face a very large hurdle when it comes
> to pulling that code back into their proprietary product.
> 
> I don't know of any good way to solve that problem. Maybe it's not
> worth solving... but I do suspect there's some useful stuff that the
> community has lost out on because of this.

Some of this is getting solved by making PostgreSQL more pluggable in
ways that isolate the proprietary stuff, i.e. make people not have to
touch the PostgreSQL core code much, if at all, in order to provide
whatever special features they provide.  Hooks and FDWs are two such
pluggable components.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Proposal to add --single-row to psql

From
Michael Paquier
Date:



On Thu, May 9, 2013 at 9:12 AM, David Fetter <david@fetter.org> wrote:
On Wed, May 08, 2013 at 06:08:28PM -0500, Jim Nasby wrote:
> I believe that makes it significantly harder for them to actually
> contribute code back that doesn't give them a business advantage, as
> well as making it harder to justify hacking on the community
> codebase because they'll just face a very large hurdle when it comes
> to pulling that code back into their proprietary product.
>
> I don't know of any good way to solve that problem. Maybe it's not
> worth solving... but I do suspect there's some useful stuff that the
> community has lost out on because of this.

Some of this is getting solved by making PostgreSQL more pluggable in
ways that isolate the proprietary stuff, i.e. make people not have to
touch the PostgreSQL core code much, if at all, in order to provide
whatever special features they provide.  Hooks and FDWs are two such
pluggable components.
 Extensions and the lovely background workers as well.
--
Michael

Re: Proposal to add --single-row to psql

From
Robert Haas
Date:
On Thu, May 9, 2013 at 7:36 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

>> Some of this is getting solved by making PostgreSQL more pluggable in
>> ways that isolate the proprietary stuff, i.e. make people not have to
>> touch the PostgreSQL core code much, if at all, in order to provide
>> whatever special features they provide.  Hooks and FDWs are two such
>> pluggable components.
>
>  Extensions and the lovely background workers as well.

Some kind of extendable parser would be awesome.  It would need to tie
into the rewriter also.

No, I don't have a clue what the design looks like.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Proposal to add --single-row to psql

From
David Fetter
Date:
On Sat, May 11, 2013 at 11:17:03AM -0400, Robert Haas wrote:
> On Thu, May 9, 2013 at 7:36 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> 
> >> Some of this is getting solved by making PostgreSQL more pluggable in
> >> ways that isolate the proprietary stuff, i.e. make people not have to
> >> touch the PostgreSQL core code much, if at all, in order to provide
> >> whatever special features they provide.  Hooks and FDWs are two such
> >> pluggable components.
> >
> >  Extensions and the lovely background workers as well.
> 
> Some kind of extendable parser would be awesome.  It would need to tie
> into the rewriter also.
> 
> No, I don't have a clue what the design looks like.

That's a direction several of the proprietary RDBMS vendors have
proposed.  I think it'd be great :)

Pre-coffee (yeah, I know.  Bad idea.) sketch of an idea: create an API
to pass expression trees in and out.  This could have other benefits
as to clustering space, shortening the planning cycle, etc., but let's
not go there for now.  My knowledge is very, very sketchy, but when I
squint, the expression trees we use look a lot like JSON.  Are they
isomorphic?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Proposal to add --single-row to psql

From
Tom Lane
Date:
David Fetter <david@fetter.org> writes:
> On Sat, May 11, 2013 at 11:17:03AM -0400, Robert Haas wrote:
>> Some kind of extendable parser would be awesome.  It would need to tie
>> into the rewriter also.
>> 
>> No, I don't have a clue what the design looks like.

> That's a direction several of the proprietary RDBMS vendors have
> proposed.  I think it'd be great :)

> Pre-coffee (yeah, I know.  Bad idea.) sketch of an idea: create an API
> to pass expression trees in and out.  This could have other benefits
> as to clustering space, shortening the planning cycle, etc., but let's
> not go there for now.  My knowledge is very, very sketchy, but when I
> squint, the expression trees we use look a lot like JSON.  Are they
> isomorphic?

By the time you've got an expression tree, the problem is mostly solved,
at least so far as parser extension is concerned.

More years ago than I care to admit, I worked on systems that had
run-time-extensible parsers at Hewlett-Packard, so technology for this
does exist.  But my (vague) memory of those systems is that the parser's
language capabilities were more limited than bison's, perhaps only
LL(1).  Parsing spec-compatible SQL that way might be a challenge.

A larger issue is that if you don't have the whole grammar available
to check, it's difficult to be sure there are no parsing conflicts.
I seem to remember that we hit some conflicts between different
extension ROMs back at HP :-(

Another point is that extensions that are actually interesting require
a lot more than new syntax.  Robert mentioned the rewriter, but you'd
typically need planner and executor additions as well.  It's possible to
see how whole new plan node types might be added by a plugin so far as
the executor is concerned, but I haven't a clue how we'd get the planner
to emit them ...
        regards, tom lane



Re: Proposal to add --single-row to psql

From
Robert Haas
Date:
On Sat, May 11, 2013 at 12:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> By the time you've got an expression tree, the problem is mostly solved,
> at least so far as parser extension is concerned.

Right.

> More years ago than I care to admit, I worked on systems that had
> run-time-extensible parsers at Hewlett-Packard, so technology for this
> does exist.  But my (vague) memory of those systems is that the parser's
> language capabilities were more limited than bison's, perhaps only
> LL(1).  Parsing spec-compatible SQL that way might be a challenge.

If I understand bison correctly, it basically looks at the current
parser state and the next token and decides to either shift that token
onto the stack or reduce the stack using some rule.  If there's no
matching rule, we error out.  If someone wants to inject new rules
into the grammar, those state tables are all going to change.  But if
we could contrive things so that the state tables are built
dynamically and can be change as rules are added and removed, then it
seems to me that we could let a loadable module supply (or delete)
whatever grammar rules it likes.  Whenever it does this, we recompile
the grammar on next use (and complain if we get ambiguities).  This
does not sound all that easy to code, but at least in theory it seems
doable.

We'd also need a way to add keywords.  To be quite frank, I think our
whole approach to keywords is massive screwed up right now.  Aside
from the parser bloat, adding even unreserved keywords breaks stuff,
e.g. "SELECT 1 this_is_not_yet_a_keyword".  EnterpriseDB's internal
bug-tracking system must have at least half a dozen bug reports open
right now that are attributable to keywords either existing at all or
being more reserved than they are in Oracle.  That aside, the amount
of trouble we've caused for PostgreSQL users over the years, either by
adding new keywords or disallowing them in contexts where they used to
work, is not small.  I don't have a specific proposal for what we
should do to make this problem less painful, but I think if we ever
revise our parser infrastructure we ought to use that as an
opportunity to try to come up with something better than what exists
today, because the status quo is awfully painful.

> Another point is that extensions that are actually interesting require
> a lot more than new syntax.  Robert mentioned the rewriter, but you'd
> typically need planner and executor additions as well.  It's possible to
> see how whole new plan node types might be added by a plugin so far as
> the executor is concerned, but I haven't a clue how we'd get the planner
> to emit them ...

I don't have any ideas either.  Still, solving some of the problems
would be better than solving none of them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Proposal to add --single-row to psql

From
Jim Nasby
Date:
On 5/11/13 11:27 AM, Tom Lane wrote:
> David Fetter <david@fetter.org> writes:
>> On Sat, May 11, 2013 at 11:17:03AM -0400, Robert Haas wrote:
>>> Some kind of extendable parser would be awesome.  It would need to tie
>>> into the rewriter also.
>>>
>>> No, I don't have a clue what the design looks like.
>
>> That's a direction several of the proprietary RDBMS vendors have
>> proposed.  I think it'd be great :)
>
>> Pre-coffee (yeah, I know.  Bad idea.) sketch of an idea: create an API
>> to pass expression trees in and out.  This could have other benefits
>> as to clustering space, shortening the planning cycle, etc., but let's
>> not go there for now.  My knowledge is very, very sketchy, but when I
>> squint, the expression trees we use look a lot like JSON.  Are they
>> isomorphic?
>
> By the time you've got an expression tree, the problem is mostly solved,
> at least so far as parser extension is concerned.
>
> More years ago than I care to admit, I worked on systems that had
> run-time-extensible parsers at Hewlett-Packard, so technology for this
> does exist.  But my (vague) memory of those systems is that the parser's
> language capabilities were more limited than bison's, perhaps only
> LL(1).  Parsing spec-compatible SQL that way might be a challenge.
>
> A larger issue is that if you don't have the whole grammar available
> to check, it's difficult to be sure there are no parsing conflicts.
> I seem to remember that we hit some conflicts between different
> extension ROMs back at HP :-(
>
> Another point is that extensions that are actually interesting require
> a lot more than new syntax.  Robert mentioned the rewriter, but you'd
> typically need planner and executor additions as well.  It's possible to
> see how whole new plan node types might be added by a plugin so far as
> the executor is concerned, but I haven't a clue how we'd get the planner
> to emit them ...

FWIW, I've wanted the ability to plug into the parser not for an extension, but so that I could programmaticly enforce
certaincoding conventions. TBH, it would be ideal if that was possible without requiring a backend (ie, some kind of
stand-alonelibrary/tool), but exposing it via a backend would be better than nothing.
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: Proposal to add --single-row to psql

From
Peter Geoghegan
Date:
On Sun, May 12, 2013 at 1:18 PM, Jim Nasby <jim@nasby.net> wrote:
> FWIW, I've wanted the ability to plug into the parser not for an extension,
> but so that I could programmaticly enforce certain coding conventions.

Depending on the exact requirements, that probably wouldn't be too
difficult. It'd likely entail adding a hook for plugins to inspect raw
parse trees.

The pg_stat_statements core infrastructure changes for 9.2 added a
hook that allowed pg_stat_statements to inspect the post-parse
analysis (query) tree. That wouldn't be suitable for what you want,
though, because that tree loses many of the details that you are
likely to care about but are inconsequential to the semantics of the
query. For example, you couldn't tell from the post-parse analysis
tree if the query text was originally "select all, the, columns from
table" or "select * from table".


-- 
Peter Geoghegan



Re: Proposal to add --single-row to psql

From
Amit Kapila
Date:
On Sunday, May 12, 2013 4:53 AM Robert Haas wrote:
> On Sat, May 11, 2013 at 12:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > By the time you've got an expression tree, the problem is mostly
> solved,
> > at least so far as parser extension is concerned.
> 
> Right.
> 
> > More years ago than I care to admit, I worked on systems that had
> > run-time-extensible parsers at Hewlett-Packard, so technology for
> this
> > does exist.  But my (vague) memory of those systems is that the
> parser's
> > language capabilities were more limited than bison's, perhaps only
> > LL(1).  Parsing spec-compatible SQL that way might be a challenge.
> 
> If I understand bison correctly, it basically looks at the current
> parser state and the next token and decides to either shift that token
> onto the stack or reduce the stack using some rule.  If there's no
> matching rule, we error out.  If someone wants to inject new rules
> into the grammar, those state tables are all going to change.  But if
> we could contrive things so that the state tables are built
> dynamically and can be change as rules are added and removed, then it
> seems to me that we could let a loadable module supply (or delete)
> whatever grammar rules it likes.  Whenever it does this, we recompile
> the grammar on next use (and complain if we get ambiguities).  This
> does not sound all that easy to code, but at least in theory it seems
> doable.

Can we think of a pluggable parser whose responsibility is to produce raw
parse tree structure similar to PostgreSQL, but internally it can handle
different database syntax (e.g Oracle).
For example below syntaxes of Oracle can be converted to PostgreSQL raw
parse tree

Oracle syntax: 
SELECT 1 + 10 * 100 + 1000 from DUAL; 
PostgreSQL syntax: 
SELECT 1 + 10 * 100 + 1000; 


Oracle syntax: 
CREATE USER xxxx IDENTIFIED BY 'password'; 
ALTER USER xxxx IDENTIFiED BY 'password'; 
PostgreSQL syntax: 
CREATE USER xxxx PASSWORD 'password'; 
ALTER USER xxxx PASSWORD 'password';

Currently if somebody has to migrate from Sybase, Oracle, SQL Server, MySQL
to PostgreSQL, he has to either use some converter tool or enhance existing
gram.y to support it.
In case current PG parser is pluggable, he can write it's parser which can
produce PG specific parse tree for another database syntax. User can load
database specific 
Parser during server start time by specifying corresponding parser dll/so in
shared_preload_libraries. 
Advantage of writing PG based parser for other database syntax is user can
directly use script files of other database, no need of conversion every
time.

Though I am not sure, but I think 70~80% of commonly used SQL syntax is
common among different databases as all standard databases are SQL-spec
compliant, so if such a parser facility
is provided, it can save lot of time taken for conversion for customers who
want to migrate from other database syntax to PostgreSQL.


With Regards,
Amit Kapila.




Re: Proposal to add --single-row to psql

From
Bruce Momjian
Date:
On Sat, May 11, 2013 at 07:22:55PM -0400, Robert Haas wrote:
> On Sat, May 11, 2013 at 12:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > By the time you've got an expression tree, the problem is mostly solved,
> > at least so far as parser extension is concerned.
> 
> Right.
> 
> > More years ago than I care to admit, I worked on systems that had
> > run-time-extensible parsers at Hewlett-Packard, so technology for this
> > does exist.  But my (vague) memory of those systems is that the parser's
> > language capabilities were more limited than bison's, perhaps only
> > LL(1).  Parsing spec-compatible SQL that way might be a challenge.
> 
> If I understand bison correctly, it basically looks at the current
> parser state and the next token and decides to either shift that token
> onto the stack or reduce the stack using some rule.  If there's no
> matching rule, we error out.  If someone wants to inject new rules
> into the grammar, those state tables are all going to change.  But if
> we could contrive things so that the state tables are built
> dynamically and can be change as rules are added and removed, then it
> seems to me that we could let a loadable module supply (or delete)
> whatever grammar rules it likes.  Whenever it does this, we recompile
> the grammar on next use (and complain if we get ambiguities).  This
> does not sound all that easy to code, but at least in theory it seems
> doable.
> 
> We'd also need a way to add keywords.  To be quite frank, I think our
> whole approach to keywords is massive screwed up right now.  Aside

FYI, one trick that is often used when parsing C code is to allow parsed
code to add to the keyword list, so when the parser goes to get the next
token, it might get "keyword" back based on what it has seen already. 
This is required to handle C typedefs, and might be a method we could
use.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: Proposal to add --single-row to psql

From
Darren Duncan
Date:
I have actually been working on the task discussed in this thread, most relevant 
parts quoted below, for awhile now, and hope to have something concrete that you 
can use by the end of this summer.

My in-development Muldis D language is homoiconic as a core feature, its source 
code is data to it, and the native syntax of the language resembles an abstract 
syntax tree.  This tree of nodes primarily defines behavior of the code, but it 
also supports arbitrary metadata per node, which for example can be used to 
preserve concrete syntax for any programming language that can be parsed into 
Muldis D nodes or conversely generated from said.

For example, you can have one type of node defining a function, and its details 
are defined by its attributes or child nodes, such as its result type, its 
parameters, whether it is declared associative/commutative/etc or not, and the 
expression(s) defining its body.  Another type of node defines a call to a 
function, another type defines a text literal, another a relation literal, and 
so on.  Conversely, a node can define a schema or package etc.  Example 
metadata, which is also structured, could include line numbers or code comments 
or whitespace or exact numeric literal formats or quoting formats or exact 
keyword choices, for example.

Using these node data types, we have a fairly normalized representation of any 
source code (or data) that is easy to introspect or transform with code.  A key 
design of Muldis D is the clear separation of syntax and behavior.

A parser is just a function that takes (typically) a character string as input 
and produces a node (tree) as output.  A compiler is just a function or 
operation that takes a node (tree) as input and produces machine code.  An 
optimizer can be a function that derives one node tree from another, either as 
its own operation or typically as part of the compiler stage.

A compiler or optimizer generally can trivially ignore the node metadata, but a 
code generator or debugger can use it; metadata can be stripped without 
affecting behavior.  The canonical tree form can also easily be mapped 
losslessly with relation forms, such as typical information schemas have.

Practically all behavior is defined in terms of generic type and routine 
definitions.  Practically all system-defined types and routines are defined in 
one or more libraries/modules/packages that have the same format as those users 
would write like extensions.  So, all the relational operators have the same 
syntax as say string or numeric or array operators.

I envision that the most effective way to use Muldis D to handle an arbitrary 
programming language, including the native SQL syntax+behavior of each SQL DBMS, 
is to have a Muldis D library+parser pair for it.

For example, to represent PostgreSQL 9.2 most directly, we have a library with 
an explicitly defined type and routine for every built-in that Pg 9.2 has, and 
we also have a parser function that takes the SQL syntax that Pg 9.2 understands 
and produces a Muldis D node tree consisting of calls to the routines of that 
library or value selectors of types in that library (things like SELECT and 
INSERT etc are each mapped to a routine too).  That way, even with a standard 
node format that isn't specific to a typical language or version, the code for 
parsing Pg 9.2 SQL has the minimal amount of indirection that it has to deal 
with, as each syntax element would have a direct library call counterpart. 
Similarly, the existing Pg 9.2 SQL compiler would have the least indirection to 
take said nodes and execute them.  (The library would be named eg "Postgres_9_2" 
for example, which makes it easier say to also have one side-by-side for other 
versions, shims, legacy code you want to more easily support compatibility with.)

Where one decides to do cross-compilation, say make Oracle SQL run on Pg, that 
could be done as simply as defining a library for Oracle SQL with the 
routines+types that has, and then mapping it to a Pg one just in terms of shim 
calls (which the compiler can optimize away as applicable), and so parsers or 
compilers never necessarily have to deal with behavior compatibility issues.

I am presently working out the exact set of such language nodes, and making a 
reference implementation which is mostly self-defined and would compile to Perl 
code, and hope to have the first Muldis D executable by the end of this summer.

I am confident that an adaption of this design into C or whatever would serve 
Postgres greatly in letting it effectively parse multiple languages, anywhere 
from application programming languages to multiple SQL dialects or versions.

Even if you don't decide to use my design specifically (which is open source and 
you can influence it), I think you should find some of the general design 
principles I stated above to be useful.  Representing behavior as libraries the 
AST being flexible enough for any concrete language without being too specific 
to details of one.  And of course, cross-invocation of code written in multiple 
languages is made much easier.

Note, just to be clear, my proposal does not necessitate that all of a node tree 
has to be kept in memory at once.  This design should be adaptable to a 
streaming approach, especially as it is expected to be able to handle database 
dumps or transfers of arbitrary size, same as SQL engines can today.  That is in 
contrast to probably what most application languages would assume, where 
everything would fit in memory at once.  But the ability to stream or not would 
largely be an implementation detail.  Realistically, all code should fit in 
memory at once, and anything that would have to be buffered out of memory would 
be say embedded data literals (whether large strings or simply large relations).

If you're not sure how my proposal would address any of the needs or wants 
raised in the thread, go ahead and ask, and I will try and answer as time permits.

-- Darren Duncan

On 2013.05.11 9:27 AM, Tom Lane wrote:
> David Fetter <david@fetter.org> writes:
>> On Sat, May 11, 2013 at 11:17:03AM -0400, Robert Haas wrote:
>>> Some kind of extendable parser would be awesome.  It would need to tie
>>> into the rewriter also.
>>>
>>> No, I don't have a clue what the design looks like.
>
>> That's a direction several of the proprietary RDBMS vendors have
>> proposed.  I think it'd be great :)
>
>> Pre-coffee (yeah, I know.  Bad idea.) sketch of an idea: create an API
>> to pass expression trees in and out.  This could have other benefits
>> as to clustering space, shortening the planning cycle, etc., but let's
>> not go there for now.  My knowledge is very, very sketchy, but when I
>> squint, the expression trees we use look a lot like JSON.  Are they
>> isomorphic?
>
> By the time you've got an expression tree, the problem is mostly solved,
> at least so far as parser extension is concerned.
>
> More years ago than I care to admit, I worked on systems that had
> run-time-extensible parsers at Hewlett-Packard, so technology for this
> does exist.  But my (vague) memory of those systems is that the parser's
> language capabilities were more limited than bison's, perhaps only
> LL(1).  Parsing spec-compatible SQL that way might be a challenge.
>
> A larger issue is that if you don't have the whole grammar available
> to check, it's difficult to be sure there are no parsing conflicts.
> I seem to remember that we hit some conflicts between different
> extension ROMs back at HP :-(
>
> Another point is that extensions that are actually interesting require
> a lot more than new syntax.  Robert mentioned the rewriter, but you'd
> typically need planner and executor additions as well.  It's possible to
> see how whole new plan node types might be added by a plugin so far as
> the executor is concerned, but I haven't a clue how we'd get the planner
> to emit them ...