Thread: COPY as a set returning function

COPY as a set returning function

From
Corey Huinker
Date:
Attached is a _very_ rough patch implementing a proof-of-concept function copy_srf();

It allows you to do things like this:

# select a,c,e from copy_srf('echo 12,345,67,89,2016-01-01',true) as t(a integer, b text, c text, d text, e date);
 a  | c  |     e
----+----+------------
 12 | 67 | 2016-01-01
(1 row)


Uses for this include:
- avoiding the pattern of creating a temp table just to select all the rows back out and then discard the table (avoidable disk activity, avoidable oid churn)
- avoiding the pattern of creating a file_fdw table in pg_temp just to drop it after one select (avoidable oid churn)
- filtering file/program input by the columns that are relevant to the user's needs.

This experiment arose from my realization that file_fdw just plugs into the externally visible portions of copy.c to do all of it's work. So why couldn't we do the same for a set returning function? Of course it wasn't as simple as that. The existing Begin/NextCopyFrom functions require the ->rel to be a valid Oid...which we won't have in this context, so I had to bypass that code and use CopyFromRawFields() directly...which isn't externally visible, hence this being a patch to core rather than an extension.

Currently the function only accepts two parameters, "filename" and "is_program". Header is always false and csv mode is always true. Obviously if we go forward on this, we'll want to add that functionality back in, but I'm holding off for now to keep the example simple and wait for consensus on future direction.

As for that future direction, we could either have:
- a robust function named something like copy_srf(), with parameters for all of the relevant options found in the COPY command
- a function that accepts an options string and parse that
- we could alter the grammar to make COPY RETURNING col1, col3, col5 FROM 'filename' a legit CTE.

Regardless of the path forward, I'm going to need help in getting there, hence this email. Thank you for your consideration.
Attachment

Re: COPY as a set returning function

From
Tom Lane
Date:
Corey Huinker <corey.huinker@gmail.com> writes:
> Attached is a _very_ rough patch implementing a proof-of-concept function
> copy_srf();
> ...
> As for that future direction, we could either have:
> - a robust function named something like copy_srf(), with parameters for
> all of the relevant options found in the COPY command
> - a function that accepts an options string and parse that
> - we could alter the grammar to make COPY RETURNING col1, col3, col5 FROM
> 'filename' a legit CTE.

I think the last of those suggestions has come up before.  It has the
large advantage that you don't have to remember a different syntax for
copy-as-a-function.  Once you had the framework for that, other
rows-returning utility commands such as EXPLAIN might plug in as well,
whenever somebody got enough of an itch for it.
        regards, tom lane



Re: COPY as a set returning function

From
Craig Ringer
Date:
<p dir="ltr"><p dir="ltr">On 1 Oct. 2016 05:20, "Tom Lane" <<a
href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>wrote:<br /> ><br /> > Corey Huinker <<a
href="mailto:corey.huinker@gmail.com">corey.huinker@gmail.com</a>>writes:<br /> > > Attached is a _very_ rough
patchimplementing a proof-of-concept function<br /> > > copy_srf();<br /> > > ...<br /> > > As for
thatfuture direction, we could either have:<br /> > > - a robust function named something like copy_srf(), with
parametersfor<br /> > > all of the relevant options found in the COPY command<br /> > > - a function that
acceptsan options string and parse that<br /> > > - we could alter the grammar to make COPY RETURNING col1, col3,
col5FROM<br /> > > 'filename' a legit CTE.<br /> ><br /> > I think the last of those suggestions has come
upbefore.  It has the<br /> > large advantage that you don't have to remember a different syntax for<br /> >
copy-as-a-function. Once you had the framework for that, other<br /> > rows-returning utility commands such as
EXPLAINmight plug in as well,<br /> > whenever somebody got enough of an itch for it.<br /><p dir="ltr">That sounds
fantastic.It'd help this copy variant retain festure parity with normal copy. And it'd bring us closer to being able to
FETCHin non queries.<p dir="ltr">>                         regards, tom lane<br /> ><br /> ><br /> > --<br
/>> Sent via pgsql-hackers mailing list (<a
href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br/> > To make changes to your
subscription:<br/> > <a
href="http://www.postgresql.org/mailpref/pgsql-hackers">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/> 

Re: COPY as a set returning function

From
Tom Lane
Date:
Craig Ringer <craig.ringer@2ndquadrant.com> writes:
> On 1 Oct. 2016 05:20, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>> I think the last of those suggestions has come up before.  It has the
>> large advantage that you don't have to remember a different syntax for
>> copy-as-a-function.

> That sounds fantastic. It'd help this copy variant retain festure parity
> with normal copy. And it'd bring us closer to being able to FETCH in non
> queries.

On second thought, though, this couldn't exactly duplicate the existing
COPY syntax, because COPY relies heavily on the rowtype of the named
target table to tell it what it's copying.  You'd need some new syntax
to provide the list of column names and types, which puts a bit of
a hole in the "syntax we already know" argument.  A SRF-returning-record
would have a leg up on that, because we do have existing syntax for
defining the concrete rowtype that any particular call returns.
        regards, tom lane



Re: COPY as a set returning function

From
Corey Huinker
Date:
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px#ccc solid;padding-left:1ex"><span>> That sounds fantastic. It'd help this copy variant retain
festureparity<br /> > with normal copy. And it'd bring us closer to being able to FETCH in non<br /> >
queries.<br/><br /></span>On second thought, though, this couldn't exactly duplicate the existing<br /> COPY syntax,
becauseCOPY relies heavily on the rowtype of the named<br /> target table to tell it what it's copying.  You'd need
somenew syntax<br /> to provide the list of column names and types, which puts a bit of<br /> a hole in the "syntax we
alreadyknow" argument.  A SRF-returning-record<br /> would have a leg up on that, because we do have existing syntax
for<br/> defining the concrete rowtype that any particular call returns.<br /><br />                         regards,
tomlane<br /></blockquote></div><br /></div><div class="gmail_extra"><br /></div><div class="gmail_extra">I would like
tomake COPY itself a SRF. That's a bit beyond my capabilities, so if that is the route we want to go, I will need
help.</div><divclass="gmail_extra"><br /></div><div class="gmail_extra">The syntax would probably look like this (new
bitsin bold):</div><div class="gmail_extra"><br /></div><blockquote style="margin:0 0 0
40px;border:none;padding:0px"><divclass="gmail_extra"><font face="monospace, monospace">WITH my_copy  AS
(</font></div><divclass="gmail_extra"><font face="monospace, monospace">    COPY FROM 'example.csv' TO <b>RESULT SET(c1
text,c2 integer, dummy1 text, dummy2 text, c5 date)</b> WITH (FORMAT CSV)</font></div><div class="gmail_extra"><font
face="monospace,monospace">    <b>RETURNING c1, c2, c3</b></font></div><div class="gmail_extra"><font face="monospace,
monospace">)</font></div><divclass="gmail_extra"><font face="monospace, monospace">SELECT ...</font></div><div
class="gmail_extra"><fontface="monospace, monospace">FROM my_copy</font></div><div class="gmail_extra"><font
face="monospace,monospace">LEFT OUTER JOIN ref_table ...</font></div></blockquote><div class="gmail_extra"><br
/></div><divclass="gmail_extra">The RESULT SET (colspecs) bit would be the rsinfo currently used by copy_srf(). It
wouldbe nice if the CTE declaration could take types, but it can't.<br /></div><div class="gmail_extra"><br
/></div><divclass="gmail_extra">The RETURNING clause here doesn't return all the columns made available from the COPY.
Thatwould be nice, but not required because the same filtration could be done when the CTE is referenced. So if we
require<font face="monospace, monospace">RETURNING *</font> be the only returning option I'd be fine with
that.</div><divclass="gmail_extra"><br /></div><div class="gmail_extra">If we're ok with adding a function like
copy_srf()to the core, will we still be happy with it when COPY does get a RETURNING clause? </div><div
class="gmail_extra"><br/></div><div class="gmail_extra"><br /></div><div class="gmail_extra">Somewhat off-topic: here's
someother observations of a n00b who spent a fair amount of time looking at the copy.c code.<br /></div><div
class="gmail_extra"><br/></div><div class="gmail_extra">1. BeginCopyFrom and NextCopyFrom pull attlist/tupdesc info
from->rel, repeatedly. If we were going to try to leverage that code we'd need to store those things in a separate
cstatemember so that we add complexity only in the initialization of the copy state data struct, pulling the result
structurefrom rsinfo rather than a relation. There's probably a minor performance gain to be had in keeping that info
around.Refactoring those two procs to allow for a pre-set attlist/tupdesc would help.</div><div class="gmail_extra"><br
/></div><divclass="gmail_extra">2. NextCopyFrom() checks every single time to see if it's row 0 and if it should skip
thisheader row. I know a single (row_num == 0 && has_header) isn't much extra processing, but shouldn't we
digestand discard headers before going into the per-row loop?<br /></div><div class="gmail_extra"><br /></div><div
class="gmail_extra">3.All the code that handles indexes, constraints, buffering, etc, simply doesn't apply in the SRF
context.</div><divclass="gmail_extra"><br /></div><div class="gmail_extra">4. The code somewhat needlessly mixes code
forthe COPY FROM and COPY TO cases. There's probably a good reason for this, but it made for a lot of clutter in
achievingmy very narrow goal.</div><div class="gmail_extra"><br /></div><div class="gmail_extra"><br /></div><div
class="gmail_extra"><br/></div><div class="gmail_extra"><br /></div><div class="gmail_extra"><br /></div><div
class="gmail_extra"><br/></div><div class="gmail_extra"><br /></div><div class="gmail_extra"><br /></div><div
class="gmail_extra"><br/></div><div class="gmail_extra"><br /></div><div class="gmail_extra"><br /></div><div
class="gmail_extra"><br/></div><div class="gmail_extra"><br /></div></div> 

Re: COPY as a set returning function

From
Craig Ringer
Date:
<p dir="ltr">On 15 Oct. 2016 04:56, "Corey Huinker" <<a
href="mailto:corey.huinker@gmail.com">corey.huinker@gmail.com</a>>wrote:<p dir="ltr">> I would like to make COPY
itselfa SRF. That's a bit beyond my capabilities, so if that is the route we want to go, I will need help.<br />
><br/> > The syntax would probably look like this (new bits in bold):<br /> ><br /> >> WITH my_copy  AS
(<br/> >>     COPY FROM 'example.csv' TO RESULT SET(c1 text, c2 integer, dummy1 text, dummy2 text, c5 date) WITH
(FORMATCSV)<br /> >>     RETURNING c1, c2, c3<br /> >> )<p dir="ltr">Strong -1 from me on this approach.
OurCTE implementation materializes everything so this is no better than COPYing to a temp table.<p dir="ltr">Not unless
youplan to fix that (and figure out the backward compatibility issues since the bug is documented as a feature) or
implementRETURNING in subqueries... I'd go for the function. 

Re: COPY as a set returning function

From
Corey Huinker
Date:
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Sun, Oct 16, 2016 at 9:01 AM, Craig Ringer <span
dir="ltr"><<ahref="mailto:craig.ringer@2ndquadrant.com" target="_blank">craig.ringer@2ndquadrant.com</a>></span>
wrote:<br/><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex"><span><pdir="ltr">On 15 Oct. 2016 04:56, "Corey Huinker" <<a
href="mailto:corey.huinker@gmail.com"target="_blank">corey.huinker@gmail.com</a>> wrote:<p dir="ltr">> I would
liketo make COPY itself a SRF. That's a bit beyond my capabilities, so if that is the route we want to go, I will need
help.<br/> ><br /> > The syntax would probably look like this (new bits in bold):<br /> ><br /> >> WITH
my_copy AS (<br /> >>     COPY FROM 'example.csv' TO RESULT SET(c1 text, c2 integer, dummy1 text, dummy2 text, c5
date)WITH (FORMAT CSV)<br /> >>     RETURNING c1, c2, c3<br /> >> )</span><p dir="ltr">Strong -1 from me on
thisapproach. Our CTE implementation materializes everything so this is no better than COPYing to a temp table.<p
dir="ltr">Notunless you plan to fix that (and figure out the backward compatibility issues since the bug is documented
asa feature) or implement RETURNING in subqueries... I'd go for the function.</blockquote></div><br /></div><div
class="gmail_extra">Well,it saves burning the oid and the pg_attribute rows. A few long running transactions can cause
pg_attributeto bloat to 400GB on one of our systems - hence my wanting something like this function.<br /><br />If it
doesstay a function, we only need to implement 8 of the 12 options as parameters (FREEZE and FORCE* options don't
apply).My guess is that future options added to COPY will be more about handling output or optimizing table inserts,
neitherof which mean more options for this proposed function.<br /><br />Would the best approach be to build in a core
srf-returningfunction that might be deprecated once COPY is set-returning AND CTEs don't have to materialize, or to
refactorwhat's in copy.c such that a contrib module can easily plug into it, and have copy_srf live there?</div><div
class="gmail_extra"><br/></div><div class="gmail_extra"><br /></div><div class="gmail_extra"><br /></div></div> 

Re: COPY as a set returning function

From
Merlin Moncure
Date:
On Fri, Sep 30, 2016 at 9:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Craig Ringer <craig.ringer@2ndquadrant.com> writes:
>> On 1 Oct. 2016 05:20, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>>> I think the last of those suggestions has come up before.  It has the
>>> large advantage that you don't have to remember a different syntax for
>>> copy-as-a-function.
>
>> That sounds fantastic. It'd help this copy variant retain festure parity
>> with normal copy. And it'd bring us closer to being able to FETCH in non
>> queries.
>
> On second thought, though, this couldn't exactly duplicate the existing
> COPY syntax, because COPY relies heavily on the rowtype of the named
> target table to tell it what it's copying.  You'd need some new syntax
> to provide the list of column names and types, which puts a bit of
> a hole in the "syntax we already know" argument.  A SRF-returning-record
> would have a leg up on that, because we do have existing syntax for
> defining the concrete rowtype that any particular call returns.

One big disadvantage of SRF-returning-record syntax is that functions
are basically unwrappable with generic wrappers sans major gymnastics
such as dynamically generating the query and executing it.  This is a
major disadvantage relative to the null::type hack we use in the
populate_record style functions and perhaps ought to make this
(SRF-returning-record syntax) style of use discouraged for useful
library functions.  If there were a way to handle wrapping I'd
withdraw this minor objection -- this has come up in dblink
discussions a few times).

merlin



Re: COPY as a set returning function

From
Corey Huinker
Date:
Attached is a patch that implements copy_srf().

The function signature reflects cstate more than it reflects the COPY options (filename+is_program instead of  FILENAME or PROGRAM, etc)

CREATE OR REPLACE FUNCTION copy_srf(
   filename    text DEFAULT null,
   is_program  boolean DEFAULT false,
   format      text DEFAULT 'text',  /* accepts text, csv, binary */
   delimiter   text DEFAULT null,
   null_string text DEFAULT E'\\N',
   header      boolean DEFAULT false,
   quote       text DEFAULT null,
   escape      text DEFAULT null,
   encoding    text DEFAULT null)
RETURNS SETOF RECORD

The major utility for this (at least for me) will be in ETLs that currently make a lot of use of temp tables, and wish to either reduce I/O or avoid pg_attribute bloat.

I have not yet implemented STDIN mode, but it's a start.

$ psql test
psql (10devel)
Type "help" for help.

# select * from    copy_srf('echo 1,2; echo 4,5',true,'csv') as t(x text, y text);
 x | y
---+---
 1 | 2
 4 | 5
(2 rows)

Time: 1.456 ms
# select * from    copy_srf('/tmp/small_file.txt'::text,false,'text') as t(x text, y text);
  x  |  y
-----+-----
 1   | 2
 a   | b
 cde | fgh
(3 rows)


On Mon, Oct 17, 2016 at 2:33 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Fri, Sep 30, 2016 at 9:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Craig Ringer <craig.ringer@2ndquadrant.com> writes:
>> On 1 Oct. 2016 05:20, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>>> I think the last of those suggestions has come up before.  It has the
>>> large advantage that you don't have to remember a different syntax for
>>> copy-as-a-function.
>
>> That sounds fantastic. It'd help this copy variant retain festure parity
>> with normal copy. And it'd bring us closer to being able to FETCH in non
>> queries.
>
> On second thought, though, this couldn't exactly duplicate the existing
> COPY syntax, because COPY relies heavily on the rowtype of the named
> target table to tell it what it's copying.  You'd need some new syntax
> to provide the list of column names and types, which puts a bit of
> a hole in the "syntax we already know" argument.  A SRF-returning-record
> would have a leg up on that, because we do have existing syntax for
> defining the concrete rowtype that any particular call returns.

One big disadvantage of SRF-returning-record syntax is that functions
are basically unwrappable with generic wrappers sans major gymnastics
such as dynamically generating the query and executing it.  This is a
major disadvantage relative to the null::type hack we use in the
populate_record style functions and perhaps ought to make this
(SRF-returning-record syntax) style of use discouraged for useful
library functions.  If there were a way to handle wrapping I'd
withdraw this minor objection -- this has come up in dblink
discussions a few times).

merlin

Attachment

Re: COPY as a set returning function

From
Haribabu Kommi
Date:


On Tue, Nov 1, 2016 at 7:45 AM, Corey Huinker <corey.huinker@gmail.com> wrote:
Attached is a patch that implements copy_srf().

Moved to next CF with "needs review" status.


Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] COPY as a set returning function

From
Michael Paquier
Date:
On Mon, Dec 5, 2016 at 2:10 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
> On Tue, Nov 1, 2016 at 7:45 AM, Corey Huinker <corey.huinker@gmail.com>
> wrote:
>>
>> Attached is a patch that implements copy_srf().
>
> Moved to next CF with "needs review" status.

This patch is still waiting for review. David, are you planning to
look at it by the end of the CF?
-- 
Michael



Re: [HACKERS] COPY as a set returning function

From
David Fetter
Date:
On Wed, Jan 25, 2017 at 02:37:57PM +0900, Michael Paquier wrote:
> On Mon, Dec 5, 2016 at 2:10 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
> > On Tue, Nov 1, 2016 at 7:45 AM, Corey Huinker <corey.huinker@gmail.com>
> > wrote:
> >>
> >> Attached is a patch that implements copy_srf().
> >
> > Moved to next CF with "needs review" status.
> 
> This patch is still waiting for review. David, are you planning to
> look at it by the end of the CF?

I'll be doing this today.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: [HACKERS] COPY as a set returning function

From
David Fetter
Date:
On Mon, Oct 31, 2016 at 04:45:40PM -0400, Corey Huinker wrote:
> Attached is a patch that implements copy_srf().
> 
> The function signature reflects cstate more than it reflects the COPY
> options (filename+is_program instead of  FILENAME or PROGRAM, etc)

The patch as it stands needs a rebase.  I'll see what I can do today.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: [HACKERS] COPY as a set returning function

From
David Fetter
Date:
On Wed, Jan 25, 2017 at 06:16:16AM -0800, David Fetter wrote:
> On Mon, Oct 31, 2016 at 04:45:40PM -0400, Corey Huinker wrote:
> > Attached is a patch that implements copy_srf().
> > 
> > The function signature reflects cstate more than it reflects the COPY
> > options (filename+is_program instead of  FILENAME or PROGRAM, etc)
> 
> The patch as it stands needs a rebase.  I'll see what I can do today.

Please find attached a rebase, which fixes an OID collision that crept in.

- The patch builds against master and passes "make check".
- The patch does not contain user-visible documentation or examples.
- The patch does not contain tests.

I got the following when I did a brief test.

SELECT * FROM copy_srf(filename => 'ls', is_program => true) AS t(l text);
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] COPY as a set returning function

From
Alvaro Herrera
Date:
David Fetter wrote:

> @@ -562,7 +563,6 @@ CopyGetData(CopyState cstate, void *databuf, int minread, int maxread)
>                           errmsg("could not read from COPY file: %m")));
>              break;
>          case COPY_OLD_FE:
> -
>              /*
>               * We cannot read more than minread bytes (which in practice is 1)
>               * because old protocol doesn't have any clear way of separating

This change is pointless as it'd be undone by pgindent.

> +    /*
> +     * Function signature is:
> +     * copy_srf( filename text default null,
> +     *           is_program boolean default false,
> +     *           format text default 'text',
> +     *           delimiter text default E'\t' in text mode, ',' in csv mode,
> +     *           null_string text default '\N',
> +     *           header boolean default false,
> +     *           quote text default '"' in csv mode only,
> +     *           escape text default null, -- defaults to whatever quote is
> +     *           encoding text default null
> +     */

This comment would be mangled by pgindent -- please add an /*--- marker
to prevent it.

> +    /* param 7: escape text default null, -- defaults to whatever quote is */
> +    if (PG_ARGISNULL(7))
> +    {
> +        copy_state.escape = copy_state.quote;
> +    }
> +    else
> +    {
> +        if (copy_state.csv_mode)
> +        {
> +            copy_state.escape = TextDatumGetCString(PG_GETARG_TEXT_P(7));
> +            if (strlen(copy_state.escape) != 1)
> +            {
> +                ereport(ERROR,
> +                        (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                         errmsg("COPY escape must be a single one-byte character")));
> +            }
> +        }
> +        else
> +        {
> +            ereport(ERROR,
> +                    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                     errmsg("COPY escape available only in CSV mode")));
> +        }
> +    }

I don't understand why do we have all these checks.  Can't we just pass
the values to COPY and let it apply the checks?  That way, when COPY is
updated to support multibyte escape chars (for example) we don't need to
touch this code.  Together with removing the unneeded braces that would
make these stanzas about six lines long instead of fifteen.


> +        tuple = heap_form_tuple(tupdesc,values,nulls);
> +        //tuple = BuildTupleFromCStrings(attinmeta, field_strings);
> +        tuplestore_puttuple(tupstore, tuple);

No need to form a tuple; use tuplestore_putvalues here.


> +    }
> +
> +    /* close "file" */
> +    if (copy_state.is_program)
> +    {
> +        int         pclose_rc;
> +
> +        pclose_rc = ClosePipeStream(copy_state.copy_file);
> +        if (pclose_rc == -1)
> +            ereport(ERROR,
> +                    (errcode_for_file_access(),
> +                     errmsg("could not close pipe to external command: %m")));
> +        else if (pclose_rc != 0)
> +            ereport(ERROR,
> +                    (errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION),
> +                     errmsg("program \"%s\" failed",
> +                            copy_state.filename),
> +                     errdetail_internal("%s", wait_result_to_str(pclose_rc))));
> +    }
> +    else
> +    {
> +        if (copy_state.filename != NULL && FreeFile(copy_state.copy_file))
> +            ereport(ERROR,
> +                    (errcode_for_file_access(),
> +                     errmsg("could not close file \"%s\": %m",
> +                            copy_state.filename)));
> +    }

I wonder if these should be an auxiliary function in copy.c to do this.
Surely copy.c itself does pretty much the same thing ...


> +DATA(insert OID = 3353 (  copy_srf PGNSP PGUID 12 1 0 0 0 f f f f f t v u 9 0 2249 "25 16 25 25 25 16 25 25 25"
_null__null_ _null_ _null_ _null_ copy_srf _null_ _null_ _null_ ));
 
> +DESCR("set-returning COPY proof of concept");

Why is this marked "proof of concept"?  If this is a PoC only, why are
you submitting it as a patch?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] COPY as a set returning function

From
Corey Huinker
Date:


On Wed, Jan 25, 2017 at 11:57 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
David Fetter wrote:

> @@ -562,7 +563,6 @@ CopyGetData(CopyState cstate, void *databuf, int minread, int maxread)
>                                                errmsg("could not read from COPY file: %m")));
>                       break;
>               case COPY_OLD_FE:
> -
>                       /*
>                        * We cannot read more than minread bytes (which in practice is 1)
>                        * because old protocol doesn't have any clear way of separating

This change is pointless as it'd be undone by pgindent.

> +     /*
> +      * Function signature is:
> +      * copy_srf( filename text default null,
> +      *           is_program boolean default false,
> +      *           format text default 'text',
> +      *           delimiter text default E'\t' in text mode, ',' in csv mode,
> +      *           null_string text default '\N',
> +      *           header boolean default false,
> +      *           quote text default '"' in csv mode only,
> +      *           escape text default null, -- defaults to whatever quote is
> +      *           encoding text default null
> +      */

This comment would be mangled by pgindent -- please add an /*--- marker
to prevent it.

> +     /* param 7: escape text default null, -- defaults to whatever quote is */
> +     if (PG_ARGISNULL(7))
> +     {
> +             copy_state.escape = copy_state.quote;
> +     }
> +     else
> +     {
> +             if (copy_state.csv_mode)
> +             {
> +                     copy_state.escape = TextDatumGetCString(PG_GETARG_TEXT_P(7));
> +                     if (strlen(copy_state.escape) != 1)
> +                     {
> +                             ereport(ERROR,
> +                                             (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                                              errmsg("COPY escape must be a single one-byte character")));
> +                     }
> +             }
> +             else
> +             {
> +                     ereport(ERROR,
> +                                     (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                                      errmsg("COPY escape available only in CSV mode")));
> +             }
> +     }

I don't understand why do we have all these checks.  Can't we just pass
the values to COPY and let it apply the checks?  That way, when COPY is
updated to support multibyte escape chars (for example) we don't need to
touch this code.  Together with removing the unneeded braces that would
make these stanzas about six lines long instead of fifteen.


> +             tuple = heap_form_tuple(tupdesc,values,nulls);
> +             //tuple = BuildTupleFromCStrings(attinmeta, field_strings);
> +             tuplestore_puttuple(tupstore, tuple);

No need to form a tuple; use tuplestore_putvalues here.


> +     }
> +
> +     /* close "file" */
> +     if (copy_state.is_program)
> +     {
> +             int         pclose_rc;
> +
> +             pclose_rc = ClosePipeStream(copy_state.copy_file);
> +             if (pclose_rc == -1)
> +                     ereport(ERROR,
> +                                     (errcode_for_file_access(),
> +                                      errmsg("could not close pipe to external command: %m")));
> +             else if (pclose_rc != 0)
> +                     ereport(ERROR,
> +                                     (errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION),
> +                                      errmsg("program \"%s\" failed",
> +                                                     copy_state.filename),
> +                                      errdetail_internal("%s", wait_result_to_str(pclose_rc))));
> +     }
> +     else
> +     {
> +             if (copy_state.filename != NULL && FreeFile(copy_state.copy_file))
> +                     ereport(ERROR,
> +                                     (errcode_for_file_access(),
> +                                      errmsg("could not close file \"%s\": %m",
> +                                                     copy_state.filename)));
> +     }

I wonder if these should be an auxiliary function in copy.c to do this.
Surely copy.c itself does pretty much the same thing ...


> +DATA(insert OID = 3353 (  copy_srf PGNSP PGUID 12 1 0 0 0 f f f f f t v u 9 0 2249 "25 16 25 25 25 16 25 25 25" _null_ _null_ _null_ _null_ _null_ copy_srf _null_ _null_ _null_ ));
> +DESCR("set-returning COPY proof of concept");

Why is this marked "proof of concept"?  If this is a PoC only, why are
you submitting it as a patch?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Feel free to mark it returned as feedback. The concept didn't generate as much enthusiasm as I had hoped, so I think the right thing to do now is scale it back to a patch that makes CopyFromRawFields() externally visible so that extensions can use them. 

Re: [HACKERS] COPY as a set returning function

From
David Fetter
Date:
On Wed, Jan 25, 2017 at 12:23:23PM -0500, Corey Huinker wrote:
> 
> Feel free to mark it returned as feedback. The concept didn't
> generate as much enthusiasm as I had hoped, so I think the right
> thing to do now is scale it back to a patch that makes
> CopyFromRawFields() externally visible so that extensions can use
> them.

You're getting enthusiasm in the form of these reviews and suggestions
for improvement.  That it doesn't always happen immediately is a
byproduct of the scarcity of developer time and the sheer volume of
things to which people need to pay attention.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: [HACKERS] COPY as a set returning function

From
Corey Huinker
Date:
On Wed, Jan 25, 2017 at 1:10 PM, David Fetter <david@fetter.org> wrote:
On Wed, Jan 25, 2017 at 12:23:23PM -0500, Corey Huinker wrote:
>
> Feel free to mark it returned as feedback. The concept didn't
> generate as much enthusiasm as I had hoped, so I think the right
> thing to do now is scale it back to a patch that makes
> CopyFromRawFields() externally visible so that extensions can use
> them.

You're getting enthusiasm in the form of these reviews and suggestions
for improvement.  That it doesn't always happen immediately is a
byproduct of the scarcity of developer time and the sheer volume of
things to which people need to pay attention.

True about that. I was referring to "ooh! I need that!"-type interest. I'll proceed, keeping in mind that there's a fallback position of just making some of the guts of COPY available to be called by extensions like was done for file_fdw.
 


 

Re: [HACKERS] COPY as a set returning function

From
Corey Huinker
Date:
> +     /* param 7: escape text default null, -- defaults to whatever quote is */
> +     if (PG_ARGISNULL(7))
> +     {
> +             copy_state.escape = copy_state.quote;
> +     }
> +     else
> +     {
> +             if (copy_state.csv_mode)
> +             {
> +                     copy_state.escape = TextDatumGetCString(PG_GETARG_TEXT_P(7));
> +                     if (strlen(copy_state.escape) != 1)
> +                     {
> +                             ereport(ERROR,
> +                                             (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                                              errmsg("COPY escape must be a single one-byte character")));
> +                     }
> +             }
> +             else
> +             {
> +                     ereport(ERROR,
> +                                     (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                                      errmsg("COPY escape available only in CSV mode")));
> +             }
> +     }

I don't understand why do we have all these checks.  Can't we just pass
the values to COPY and let it apply the checks?  That way, when COPY is
updated to support multibyte escape chars (for example) we don't need to
touch this code.  Together with removing the unneeded braces that would
make these stanzas about six lines long instead of fifteen.

If I understand you correctly, COPY (via BeginCopyFrom) itself relies on having a relation in pg_class to reference for attributes.
In this case, there is no such relation. So I'd have to fake a relcache entry, or refactor BeginCopyFrom() to extract a ReturnSetInfo from the Relation and pass that along to a new function BeginCopyFromReturnSet. I'm happy to go that route if you think it's a good idea.
 


> +             tuple = heap_form_tuple(tupdesc,values,nulls);
> +             //tuple = BuildTupleFromCStrings(attinmeta, field_strings);
> +             tuplestore_puttuple(tupstore, tuple);

No need to form a tuple; use tuplestore_putvalues here.

Good to know!

 
> +     }
> +
> +     /* close "file" */
> +     if (copy_state.is_program)
> +     {
> +             int         pclose_rc;
> +
> +             pclose_rc = ClosePipeStream(copy_state.copy_file);
> +             if (pclose_rc == -1)
> +                     ereport(ERROR,
> +                                     (errcode_for_file_access(),
> +                                      errmsg("could not close pipe to external command: %m")));
> +             else if (pclose_rc != 0)
> +                     ereport(ERROR,
> +                                     (errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION),
> +                                      errmsg("program \"%s\" failed",
> +                                                     copy_state.filename),
> +                                      errdetail_internal("%s", wait_result_to_str(pclose_rc))));
> +     }
> +     else
> +     {
> +             if (copy_state.filename != NULL && FreeFile(copy_state.copy_file))
> +                     ereport(ERROR,
> +                                     (errcode_for_file_access(),
> +                                      errmsg("could not close file \"%s\": %m",
> +                                                     copy_state.filename)));
> +     }

I wonder if these should be an auxiliary function in copy.c to do this.
Surely copy.c itself does pretty much the same thing ...

Yes. This got started as a patch to core because not all of the parts of COPY are externally callable, and aren't broken down in a way that allowed for use in a SRF.

I'll get to work on these suggestions.


Re: [HACKERS] COPY as a set returning function

From
Corey Huinker
Date:

I don't understand why do we have all these checks.  Can't we just pass
the values to COPY and let it apply the checks?  That way, when COPY is
updated to support multibyte escape chars (for example) we don't need to
touch this code.  Together with removing the unneeded braces that would
make these stanzas about six lines long instead of fifteen.

If I understand you correctly, COPY (via BeginCopyFrom) itself relies on having a relation in pg_class to reference for attributes.
In this case, there is no such relation. So I'd have to fake a relcache entry, or refactor BeginCopyFrom() to extract a ReturnSetInfo from the Relation and pass that along to a new function BeginCopyFromReturnSet. I'm happy to go that route if you think it's a good idea.
 


> +             tuple = heap_form_tuple(tupdesc,values,nulls);
> +             //tuple = BuildTupleFromCStrings(attinmeta, field_strings);
> +             tuplestore_puttuple(tupstore, tuple);

No need to form a tuple; use tuplestore_putvalues here.

Good to know!

 

I wonder if these should be an auxiliary function in copy.c to do this.
Surely copy.c itself does pretty much the same thing ...

Yes. This got started as a patch to core because not all of the parts of COPY are externally callable, and aren't broken down in a way that allowed for use in a SRF.

I'll get to work on these suggestions.

I've put in some more work on this patch, mostly just taking Alvaro's suggestions, which resulted in big code savings.

I had to add a TupleDesc parameter to BeginCopy() and BeginCopyFrom(). This seemed the easiest way to leverage the existing tested code (and indeed, it worked nearly out-of-the-box). The only drawback is that a minor change will have to be made to the BeginCopyFrom() call in file_fdw.c, and any other extensions that leverage COPY. We could make compatibility functions that take the original signature and pass it along to the corresponding function with rsTupDesc set to NULL.

Some issues:
- I'm still not sure if the direction we want to go is a set returning function, or a change in grammar that lets us use COPY as a CTE or similar.
- This function will have the same difficulties as adding the program option did to file_fdw: there's very little we can reference that isn't os/environment specific
- Inline (STDIN) prompts the user for input, but gives the error: server sent data ("D" message) without prior row description ("T" message). I looked for a place where the Relation was consulted for the row description, but I'm not finding it.

I can continue to flesh this out with documentation and test cases if there is consensus that this is the way to go.


# select * from copy_srf('echo "x\ty"',true) as t(x text, y text);
 x | y
---+---
 x | y
(1 row)

Time: 1.074 ms
# select * from copy_srf('echo "x\t4"',true) as t(x text, y integer);
 x | y
---+---
 x | 4
(1 row)

Time: 1.095 ms
# select * from copy_srf(null) as t(x text, y integer);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> a    4
>> b    5
>> \.
server sent data ("D" message) without prior row description ("T" message)

Attachment

Re: [HACKERS] COPY as a set returning function

From
David Fetter
Date:
On Wed, Jan 25, 2017 at 08:51:38PM -0500, Corey Huinker wrote:
> I've put in some more work on this patch, mostly just taking Alvaro's
> suggestions, which resulted in big code savings.

The patch applies atop master.

The test (ls) which previously crashed the backend now doesn't, and
works in a reasonable way.

The description of the function still talks about its being a proof of
concept.

There are still neither regression tests nor SGML documentation.

Are we at a point where we should add these things?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: [HACKERS] COPY as a set returning function

From
Peter Eisentraut
Date:
On 1/25/17 8:51 PM, Corey Huinker wrote:
> # select * from copy_srf('echo "x\ty"',true) as t(x text, y text);

I find these parameters weird.  Just looking at this, one has no idea
what the "true" means.  Why not have a "filename" and a "program"
parameter and make them mutually exclusive?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] COPY as a set returning function

From
Peter Eisentraut
Date:
On 1/27/17 8:07 PM, David Fetter wrote:
> There are still neither regression tests nor SGML documentation.
> 
> Are we at a point where we should add these things?

One could argue about the documentation at this point, since the
function is somewhat self-documenting for the time being.  But surely
there should be tests.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] COPY as a set returning function

From
Corey Huinker
Date:
On Fri, Jan 27, 2017 at 9:42 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 1/25/17 8:51 PM, Corey Huinker wrote:
> # select * from copy_srf('echo "x\ty"',true) as t(x text, y text);

I find these parameters weird.  Just looking at this, one has no idea
what the "true" means.  Why not have a "filename" and a "program"
parameter and make them mutually exclusive?

It was done that way to match the parameters of BeginCopyFrom() and it could easily be changed.

I suppose I could have written it as:
select * from copy_srf(filename => 'echo "x\ty"', is_program => true) as t(x text, y text);

But this goes to the core of this patch/poc: I don't know where we want to take it next.

Options at this point are:
1. Continue with a SRF, in which case discussions about parameters are completely valid.
2. Add a RETURNING clause to COPY. This would dispense with the parameters problem, but potentially create others.
3. Add the TupDesc parameter to BeginCopyFrom, make sure all data structures are visible to an extension, and call it a day. If someone wants to write an extension that makes their own copy_srf(), they can.
4. Someone else's better idea.


 

Re: [HACKERS] COPY as a set returning function

From
Michael Paquier
Date:
On Tue, Jan 31, 2017 at 3:05 AM, Corey Huinker <corey.huinker@gmail.com> wrote:
> On Fri, Jan 27, 2017 at 9:42 PM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>>
>> On 1/25/17 8:51 PM, Corey Huinker wrote:
>> > # select * from copy_srf('echo "x\ty"',true) as t(x text, y text);
>>
>> I find these parameters weird.  Just looking at this, one has no idea
>> what the "true" means.  Why not have a "filename" and a "program"
>> parameter and make them mutually exclusive?
>
>
> It was done that way to match the parameters of BeginCopyFrom() and it could
> easily be changed.
>
> I suppose I could have written it as:
>
> select * from copy_srf(filename => 'echo "x\ty"', is_program => true) as t(x
> text, y text);
>
>
> But this goes to the core of this patch/poc: I don't know where we want to
> take it next.
>
> Options at this point are:
> 1. Continue with a SRF, in which case discussions about parameters are
> completely valid.
> 2. Add a RETURNING clause to COPY. This would dispense with the parameters
> problem, but potentially create others.
> 3. Add the TupDesc parameter to BeginCopyFrom, make sure all data structures
> are visible to an extension, and call it a day. If someone wants to write an
> extension that makes their own copy_srf(), they can.
> 4. Someone else's better idea.

Here is a 4: Refactoring BeginCopyFrom so as instead of a Relation are
used a TupleDesc, a default expression list, and a relation name. You
could as well make NextCopyFrom() smarter so as it does nothing if no
expression contexts are given by the caller, which is the case of your
function here. To be honest, I find a bit weird to use
NextCopyFromRawFields when there is already a bunch of sanity checks
happening in NextCopyFrom(), where you surely want to have them
checked even for your function.

Looking at the last patch proposed, I find the current patch proposed
as immature, as rsTupDesc basically overlaps with the relation a
caller can define in BeginCopyFrom(), so marking this patch as
"returned with feedback".
-- 
Michael



Re: [HACKERS] COPY as a set returning function

From
Corey Huinker
Date:

Here is a 4: Refactoring BeginCopyFrom so as instead of a Relation are
used a TupleDesc, a default expression list, and a relation name. You
could as well make NextCopyFrom() smarter so as it does nothing if no
expression contexts are given by the caller, which is the case of your
function here. To be honest, I find a bit weird to use
NextCopyFromRawFields when there is already a bunch of sanity checks
happening in NextCopyFrom(), where you surely want to have them
checked even for your function.

Looking at the last patch proposed, I find the current patch proposed
as immature, as rsTupDesc basically overlaps with the relation a
caller can define in BeginCopyFrom(), so marking this patch as
"returned with feedback".
--
Michael

I like that suggestion and will move forward on it. I wasn't sure there would be support for such a refactoring.

As for the copy from stdin case, Andrew Gierth laid out why that's nearly impossible to unwind in the network protocol (the act of starting the copy causes the query result to end before any rows were returned), and that might make STDIN input a dead issue.