Thread: Function to execute a program

Function to execute a program

From
Magnus Hagander
Date:
Today we have pg_read_file() and pg_read_binary_file(), which mostly correspond to COPY from a file, but reads a whole (or parts but based on bytes) into a single value rather than rows.

Would it make sense to have a pg_execute_program() that corresponds to COPY FROM PROGRAM? This would obviously have the same permissions restrictions as COPY FROM PROGRAM.

The usecase would be to for example execute a command that returns json format output, which could then be parsed and processed as part of a query. Today, COPY FROM PROGRAM cannot do this, as we can't read a multiline json value with COPY.

Thoughts?

//Magnus

Re: Function to execute a program

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> Would it make sense to have a pg_execute_program() that corresponds to COPY
> FROM PROGRAM? This would obviously have the same permissions restrictions
> as COPY FROM PROGRAM.
> The usecase would be to for example execute a command that returns json
> format output, which could then be parsed and processed as part of a query.
> Today, COPY FROM PROGRAM cannot do this, as we can't read a multiline json
> value with COPY.

copy ... from program 'random_json_producer | tr "\n\t" "  "';

I don't necessarily object to providing such a function to make it
easier, but it's not the case that you can't have the functionality
today.

            regards, tom lane



Re: Function to execute a program

From
Magnus Hagander
Date:


On Sat, Sep 12, 2020 at 5:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> Would it make sense to have a pg_execute_program() that corresponds to COPY
> FROM PROGRAM? This would obviously have the same permissions restrictions
> as COPY FROM PROGRAM.
> The usecase would be to for example execute a command that returns json
> format output, which could then be parsed and processed as part of a query.
> Today, COPY FROM PROGRAM cannot do this, as we can't read a multiline json
> value with COPY.

copy ... from program 'random_json_producer | tr "\n\t" "  "';

I don't necessarily object to providing such a function to make it
easier, but it's not the case that you can't have the functionality
today.

"tr" is not typically available on Windows, for one :)

But yes, assuming tr is available, it is true that you can. (And you can perhaps find something else to pipe it through on Windows). Of course, you still can't use it in a query, since COPY can only target a table :) Independently of something like this it would be nice to be able to target say a CTE with COPY, but that's kan entirely different topic.

--

Re: Function to execute a program

From
Andrew Dunstan
Date:
On 9/12/20 11:11 AM, Magnus Hagander wrote:
>
>
> On Sat, Sep 12, 2020 at 5:06 PM Tom Lane <tgl@sss.pgh.pa.us
> <mailto:tgl@sss.pgh.pa.us>> wrote:
>
>     Magnus Hagander <magnus@hagander.net <mailto:magnus@hagander.net>>
>     writes:
>     > Would it make sense to have a pg_execute_program() that
>     corresponds to COPY
>     > FROM PROGRAM? This would obviously have the same permissions
>     restrictions
>     > as COPY FROM PROGRAM.
>     > The usecase would be to for example execute a command that
>     returns json
>     > format output, which could then be parsed and processed as part
>     of a query.
>     > Today, COPY FROM PROGRAM cannot do this, as we can't read a
>     multiline json
>     > value with COPY.
>
>     copy ... from program 'random_json_producer | tr "\n\t" "  "';
>
>     I don't necessarily object to providing such a function to make it
>     easier, but it's not the case that you can't have the functionality
>     today.
>
>
> "tr" is not typically available on Windows, for one :)
>
> But yes, assuming tr is available, it is true that you can. (And you
> can perhaps find something else to pipe it through on Windows). Of
> course, you still can't use it in a query, since COPY can only target
> a table :) Independently of something like this it would be nice to be
> able to target say a CTE with COPY, but that's kan entirely different
> topic.
>
>

A more robust recipe  would use "jq -c" as the filter. And it's
available on Windows via chocolatey.


I don't have a strong opinion on the suggested facility. Presumably you
can construct a function that runs COPY into a temp table and then
returns the results. But maybe that's more work than we should require
users to perform.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Function to execute a program

From
Stephen Frost
Date:
Greetings,

* Magnus Hagander (magnus@hagander.net) wrote:
> Would it make sense to have a pg_execute_program() that corresponds to COPY
> FROM PROGRAM? This would obviously have the same permissions restrictions
> as COPY FROM PROGRAM.

Eh, perhaps.

> The usecase would be to for example execute a command that returns json
> format output, which could then be parsed and processed as part of a query.
> Today, COPY FROM PROGRAM cannot do this, as we can't read a multiline json
> value with COPY.

I'd rather come up with a way to import this kind of object into PG by
using COPY rather than adding a different way to pull them in.

In the "this is a crazy idea" category- have a way to specify an input
function to COPY, or a target data type (and then use it's input
function) to which COPY just keeps feeding bytes until the function
comes back with "ok, I got a value to return" or something.

Would be really neat to have a way to COPY in a JSON, XML, or whatever
even if it's multi-line and even if it has multiple objects (ie: 10
complete JSON documents in a single file would become 10 rows of jsonb).

Thanks,

Stephen

Attachment

Re: Function to execute a program

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Magnus Hagander (magnus@hagander.net) wrote:
>> Would it make sense to have a pg_execute_program() that corresponds to COPY
>> FROM PROGRAM? This would obviously have the same permissions restrictions
>> as COPY FROM PROGRAM.

> I'd rather come up with a way to import this kind of object into PG by
> using COPY rather than adding a different way to pull them in.

I'm not for overloading COPY to try to make it handle every data import
use-case.  The issue here AIUI is that Magnus wants the program output
to be read as an uninterpreted blob (which he'll then try to convert to
jsonb or whatever, but that's not the concern of the import code).  This
is exactly antithetical to COPY's mission of reading some rows that are
made up of some columns and putting the result into a table.

Yeah, we could no doubt add some functionality to disable all the
row-splitting and column-splitting and associated escaping logic,
but that's going to make COPY slower and more complicated.  And it
still doesn't address wanting to use the result directly in a query
instead of sticking it into a table.

            regards, tom lane



Re: Function to execute a program

From
Stephen Frost
Date:
Greetings.

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Magnus Hagander (magnus@hagander.net) wrote:
> >> Would it make sense to have a pg_execute_program() that corresponds to COPY
> >> FROM PROGRAM? This would obviously have the same permissions restrictions
> >> as COPY FROM PROGRAM.
>
> > I'd rather come up with a way to import this kind of object into PG by
> > using COPY rather than adding a different way to pull them in.
>
> I'm not for overloading COPY to try to make it handle every data import
> use-case.  The issue here AIUI is that Magnus wants the program output
> to be read as an uninterpreted blob (which he'll then try to convert to
> jsonb or whatever, but that's not the concern of the import code).  This
> is exactly antithetical to COPY's mission of reading some rows that are
> made up of some columns and putting the result into a table.

I don't really think there's anything inherent in the fact that "COPY"
today only has one way to handle data that the user wants to import that
it should be required to always operate in that manner.

As for slowing down the current method- I don't think that we'd
implement such a change as just a modification to the existing optimized
parsing code as that just wouldn't make any sense and would slow COPY
down for this use-case, but having a COPY command that's able to work in
a few different modes when it comes to importing data seems like it
could be sensible, fast, and clear to users.

One could imagine creating some other top-level command to handle more
complex import cases than what COPY does today but I don't actually
think that'd be an improvment.

> Yeah, we could no doubt add some functionality to disable all the
> row-splitting and column-splitting and associated escaping logic,
> but that's going to make COPY slower and more complicated.  And it
> still doesn't address wanting to use the result directly in a query
> instead of sticking it into a table.

The way that's handled for the cases that COPY does work with today is
file_fdw.  Ideally, we'd do the same here.

Ultimately, COPY absolutely *is* our general data import tool- it's just
that today we push some of the work to make things 'fit' on the user and
that ends up with pain points like exactly what Magnus has pointed out
here.  We should be looking to improve that situation, and I don't
really care for the solution to that being "create some random other new
thing for data import that users then have to know exists and learn how
to use".

Thanks,

Stephen

Attachment