Thread: COPY IN as SELECT target

COPY IN as SELECT target

From
Andrew Dunstan
Date:
Recently there was discussion about allowing a COPY statement to be a 
SELECT target, returning a text array, although the syntax wasn't really 
nailed down that I recall. I was thinking that  we might have
   COPY RETURNING ARRAY FROM ...

instead of
   COPY tablename opt_column_list FROM ...


the we possibly could do things like:
   SELECT t[5] as a, 3*(t[3]::numeric) as b FROM (COPY RETURNING ARRAY 
FROM STDIN CSV) as t;

Thoughts?

cheers

andrew



Re: COPY IN as SELECT target

From
Heikki Linnakangas
Date:
Andrew Dunstan wrote:
> 
> Recently there was discussion about allowing a COPY statement to be a
> SELECT target, returning a text array, although the syntax wasn't really
> nailed down that I recall. I was thinking that  we might have
> 
>    COPY RETURNING ARRAY FROM ...
> 
> instead of
> 
>    COPY tablename opt_column_list FROM ...

It's not really returning an array, is it? It's returning a bag of rows
like a (sub)query.

> the we possibly could do things like:
> 
>    SELECT t[5] as a, 3*(t[3]::numeric) as b FROM (COPY RETURNING ARRAY
> FROM STDIN CSV) as t;

How about just COPY FROM? As in

SELECT t[5] as a, 3*(t[3]::numeric) as b FROM (COPY FROM STDIN CSV) as t

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: COPY IN as SELECT target

From
Robert Haas
Date:
On Thu, Dec 17, 2009 at 12:23 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> How about just COPY FROM? As in
>
> SELECT t[5] as a, 3*(t[3]::numeric) as b FROM (COPY FROM STDIN CSV) as t

I had the same thought.  Though it would also be nice to allow something like:

COPY (type1, type2, type3, type4) FROM STDIN CSV

...which is obviously going to create a horrible parser problem if you
actually tried to use that syntax.

...Robert


Re: COPY IN as SELECT target

From
David Fetter
Date:
On Thu, Dec 17, 2009 at 12:28:50PM -0500, Robert Haas wrote:
> On Thu, Dec 17, 2009 at 12:23 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
> > How about just COPY FROM? As in
> >
> > SELECT t[5] as a, 3*(t[3]::numeric) as b FROM (COPY FROM STDIN
> > CSV) as t
> 
> I had the same thought.  Though it would also be nice to allow
> something like:
> 
> COPY (type1, type2, type3, type4) FROM STDIN CSV
> 
> ...which is obviously going to create a horrible parser problem if
> you actually tried to use that syntax.

How about using the CTE syntax?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: COPY IN as SELECT target

From
Robert Haas
Date:
On Thu, Dec 17, 2009 at 12:38 PM, David Fetter <david@fetter.org> wrote:
> On Thu, Dec 17, 2009 at 12:28:50PM -0500, Robert Haas wrote:
>> On Thu, Dec 17, 2009 at 12:23 PM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>> > How about just COPY FROM? As in
>> >
>> > SELECT t[5] as a, 3*(t[3]::numeric) as b FROM (COPY FROM STDIN
>> > CSV) as t
>>
>> I had the same thought.  Though it would also be nice to allow
>> something like:
>>
>> COPY (type1, type2, type3, type4) FROM STDIN CSV
>>
>> ...which is obviously going to create a horrible parser problem if
>> you actually tried to use that syntax.
>
> How about using the CTE syntax?

I'm not sure what you're suggesting exactly, but the problem with the
syntax I suggested is that COPY (...) TO <whatever> expects the "..."
part to be a subselect.  You can't make COPY (...) FROM have something
in there other than a subselect, because the parser can't fast-forward
and look at the word FROM and then go back and decide how to parse the
parenthesized stuff.  That's almost magic in the general case.  You'd
have to stick a keyword in there before the opening parentheses.

...Robert


Re: COPY IN as SELECT target

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Andrew Dunstan wrote:
>> COPY RETURNING ARRAY FROM ...

> It's not really returning an array, is it? It's returning a bag of rows
> like a (sub)query.

> How about just COPY FROM?

The problem with COPY FROM is that it hard-wires a decision that there
is one and only one possible result format, which I think we pretty
much proved already is the wrong thing.  I'm not thrilled with "RETURNING
ARRAY" either, but we need to leave ourselves wiggle room to have more
than one result format from the same source file.
        regards, tom lane


Re: COPY IN as SELECT target

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>   
>> Andrew Dunstan wrote:
>>     
>>> COPY RETURNING ARRAY FROM ...
>>>       
>
>   
>> It's not really returning an array, is it? It's returning a bag of rows
>> like a (sub)query.
>>     
>
>   
>> How about just COPY FROM?
>>     
>
> The problem with COPY FROM is that it hard-wires a decision that there
> is one and only one possible result format, which I think we pretty
> much proved already is the wrong thing.  I'm not thrilled with "RETURNING
> ARRAY" either, but we need to leave ourselves wiggle room to have more
> than one result format from the same source file.
>
>             
>   

Well, we could have "RETURNING type-expression" with  "text[]" supported 
for the first iteration.

In answer to Heiki's argument, what I wanted was exactly to return an 
array of text for each row. Whatever we have needs to be able to handle 
to possibility of ragged input (see previous discussion) so we can't tie 
it down too tightly.

cheers

andrew


Re: COPY IN as SELECT target

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> The problem with COPY FROM is that it hard-wires a decision that there
>> is one and only one possible result format, which I think we pretty
>> much proved already is the wrong thing.  I'm not thrilled with "RETURNING
>> ARRAY" either, but we need to leave ourselves wiggle room to have more
>> than one result format from the same source file.

> Well, we could have "RETURNING type-expression" with  "text[]" supported 
> for the first iteration.

> In answer to Heiki's argument, what I wanted was exactly to return an 
> array of text for each row. Whatever we have needs to be able to handle 
> to possibility of ragged input (see previous discussion) so we can't tie 
> it down too tightly.

I think that there are two likely possibilities for the result format:

* "Raw" data after just the de-escaping and column separation steps.
Array of text is probably the right thing here, at least for a text COPY
(doesn't seem to cover the binary case though).

* The data converted to some specified row type.

"RETURNING type-expression" is probably not good since it looks more
like the second case than the first --- and in fact it could be outright
ambiguous, what if your data actually is one column that is a text
array?

If we're willing to assume these are the *only* possibilities then we
could use "COPY FROM ..." for the first and "COPY RETURNING type-list
FROM ..." for the second.  I'm a bit uncomfortable with that assumption
though; it seems likely that we'll want to shoehorn in some more
alternatives later.  (Like, what about the binary case?)
        regards, tom lane


Re: COPY IN as SELECT target

From
Josh Berkus
Date:
> In answer to Heiki's argument, what I wanted was exactly to return an
> array of text for each row. Whatever we have needs to be able to handle
> to possibility of ragged input (see previous discussion) so we can't tie
> it down too tightly.

I would have *lots* of use for this feature.

Mind you, returning (arbitrary expression) would be even better, but if
we can get returning TEXT[] for 8.5, I think it's worth doing on its own.

--Josh Berkus



Re: COPY IN as SELECT target

From
Robert Haas
Date:
On Thu, Dec 17, 2009 at 1:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> Tom Lane wrote:
>>> The problem with COPY FROM is that it hard-wires a decision that there
>>> is one and only one possible result format, which I think we pretty
>>> much proved already is the wrong thing.  I'm not thrilled with "RETURNING
>>> ARRAY" either, but we need to leave ourselves wiggle room to have more
>>> than one result format from the same source file.
>
>> Well, we could have "RETURNING type-expression" with  "text[]" supported
>> for the first iteration.
>
>> In answer to Heiki's argument, what I wanted was exactly to return an
>> array of text for each row. Whatever we have needs to be able to handle
>> to possibility of ragged input (see previous discussion) so we can't tie
>> it down too tightly.
>
> I think that there are two likely possibilities for the result format:
>
> * "Raw" data after just the de-escaping and column separation steps.
> Array of text is probably the right thing here, at least for a text COPY
> (doesn't seem to cover the binary case though).
>
> * The data converted to some specified row type.

Agreed.

> "RETURNING type-expression" is probably not good since it looks more
> like the second case than the first --- and in fact it could be outright
> ambiguous, what if your data actually is one column that is a text
> array?
>
> If we're willing to assume these are the *only* possibilities then we
> could use "COPY FROM ..." for the first and "COPY RETURNING type-list
> FROM ..." for the second.  I'm a bit uncomfortable with that assumption
> though; it seems likely that we'll want to shoehorn in some more
> alternatives later.  (Like, what about the binary case?)

You might want to specify column names as well as well as types, in
this second case.

...Robert


Re: COPY IN as SELECT target

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> You might want to specify column names as well as well as types, in
> this second case.

Well, we could do it like VALUES: arbitrarily name the columns column1
... columnN and tell people to use an alias if they want other names.
If it's convenient to fit column names into the syntax, good, but we
don't absolutely have to.

[ thinks... ] Although actually the obvious SQL-ish syntax for a rowtype
specification is
( colname typename [ , ... ] )

so that's probably what we'd want to do in the processed-data case.
Not sure about the raw-data case --- maybe a predetermined name is
okay there.
        regards, tom lane


Re: COPY IN as SELECT target

From
Robert Haas
Date:
On Thu, Dec 17, 2009 at 1:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> You might want to specify column names as well as well as types, in
>> this second case.
>
> Well, we could do it like VALUES: arbitrarily name the columns column1
> ... columnN and tell people to use an alias if they want other names.
> If it's convenient to fit column names into the syntax, good, but we
> don't absolutely have to.
>
> [ thinks... ] Although actually the obvious SQL-ish syntax for a rowtype
> specification is
>
>        ( colname typename [ , ... ] )
>
> so that's probably what we'd want to do in the processed-data case.

Yeah, I think that's good.

> Not sure about the raw-data case --- maybe a predetermined name is
> okay there.

I would expect so.

...Robert


Re: COPY IN as SELECT target

From
Dimitri Fontaine
Date:
Hi,

Le 17 déc. 2009 à 19:39, Josh Berkus a écrit :
> Mind you, returning (arbitrary expression) would be even better, but if
> we can get returning TEXT[] for 8.5, I think it's worth doing on its own.

Well, you already have it as soon as you have text[]:
INSERT INTO destinationSELECT row[0], row[1], myfunction(row[0], row[1]), row[2]::int + 1  FROM (COPY RETURNING text[]
FROM'/path/to/file.cvs' CVS HEADER) as file(row); 

Of course as Andrew said already what it needs that the syntax here does not cover is ragged file processing, that is
acceptingfile content when all the rows will not have the same number of columns. 

But if you have ragged input reading and COPY as a relation in a query, then you're able to apply any expression you
wantto in the query itself. Such as transforming the input slightly in order to conform to PostgreSQL datatype input
syntaxes,e.g. 

Regards,
--
dim

Let's deprecate pgloader.

Re: COPY IN as SELECT target

From
Pavel Stehule
Date:
2009/12/17 Andrew Dunstan <andrew@dunslane.net>:
>
> Recently there was discussion about allowing a COPY statement to be a SELECT
> target, returning a text array, although the syntax wasn't really nailed
> down that I recall. I was thinking that  we might have
>
>   COPY RETURNING ARRAY FROM ...
>
> instead of
>
>   COPY tablename opt_column_list FROM ...
>
>
> the we possibly could do things like:
>
>   SELECT t[5] as a, 3*(t[3]::numeric) as b FROM (COPY RETURNING ARRAY FROM
> STDIN CSV) as t;
>
> Thoughts?

In this case copy doesn't return array - so RETURNING ARRAY is little
bit strange.

what

SELECT .. FROM (COPY VALUES [(colums)] FROM ....)

Regards
Pavel

>
> cheers
>
> andrew
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: COPY IN as SELECT target

From
Andrew Dunstan
Date:

Pavel Stehule wrote:
> 2009/12/17 Andrew Dunstan <andrew@dunslane.net>:
>   
>> Recently there was discussion about allowing a COPY statement to be a SELECT
>> target, returning a text array, although the syntax wasn't really nailed
>> down that I recall. I was thinking that  we might have
>>
>>   COPY RETURNING ARRAY FROM ...
>>
>> instead of
>>
>>   COPY tablename opt_column_list FROM ...
>>
>>
>> the we possibly could do things like:
>>
>>   SELECT t[5] as a, 3*(t[3]::numeric) as b FROM (COPY RETURNING ARRAY FROM
>> STDIN CSV) as t;
>>
>> Thoughts?
>>     
>
> In this case copy doesn't return array - so RETURNING ARRAY is little
> bit strange.
>
> what
>
> SELECT .. FROM (COPY VALUES [(colums)] FROM ....)
>
>   


You are misunderstanding what I want to provide, which is that it *does* 
return an array of text for each line. That was what the previous 
discussion arrived at, and is illustrated in the example I showed above.


cheers

andrew


Re: COPY IN as SELECT target

From
Andrew Dunstan
Date:

Tom Lane wrote:
> I think that there are two likely possibilities for the result format:
>
> * "Raw" data after just the de-escaping and column separation steps.
> Array of text is probably the right thing here, at least for a text COPY
> (doesn't seem to cover the binary case though).
>
> * The data converted to some specified row type.
>
> "RETURNING type-expression" is probably not good since it looks more
> like the second case than the first --- and in fact it could be outright
> ambiguous, what if your data actually is one column that is a text
> array?
>
> If we're willing to assume these are the *only* possibilities then we
> could use "COPY FROM ..." for the first and "COPY RETURNING type-list
> FROM ..." for the second.  I'm a bit uncomfortable with that assumption
> though; it seems likely that we'll want to shoehorn in some more
> alternatives later.  (Like, what about the binary case?)
>
>             
>   

Yeah. I think we need an explicit marker.  The first of these cases is 
the one I'm particularly interested in. I think you could actually get 
the second from the first with a little more work anyway, but the "raw" 
input as an array lets me get the things I can't easily get another way.

I think we're going to need some marker such as parentheses to 
distinguish the second case. In that case, "RETURNING text[]" could be 
the first case and "RETURNING (text[])" could be the second, but maybe 
that's a bit too subtle. How about "RETURNING TYPE (type_list)" for the 
second case?

cheers

andrew