Re: Function to execute a program - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Function to execute a program
Date
Msg-id 20200914145705.GK3063@tamriel.snowman.net
Whole thread Raw
In response to Re: Function to execute a program  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Greetings.

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Magnus Hagander (magnus@hagander.net) wrote:
> >> Would it make sense to have a pg_execute_program() that corresponds to COPY
> >> FROM PROGRAM? This would obviously have the same permissions restrictions
> >> as COPY FROM PROGRAM.
>
> > I'd rather come up with a way to import this kind of object into PG by
> > using COPY rather than adding a different way to pull them in.
>
> I'm not for overloading COPY to try to make it handle every data import
> use-case.  The issue here AIUI is that Magnus wants the program output
> to be read as an uninterpreted blob (which he'll then try to convert to
> jsonb or whatever, but that's not the concern of the import code).  This
> is exactly antithetical to COPY's mission of reading some rows that are
> made up of some columns and putting the result into a table.

I don't really think there's anything inherent in the fact that "COPY"
today only has one way to handle data that the user wants to import that
it should be required to always operate in that manner.

As for slowing down the current method- I don't think that we'd
implement such a change as just a modification to the existing optimized
parsing code as that just wouldn't make any sense and would slow COPY
down for this use-case, but having a COPY command that's able to work in
a few different modes when it comes to importing data seems like it
could be sensible, fast, and clear to users.

One could imagine creating some other top-level command to handle more
complex import cases than what COPY does today but I don't actually
think that'd be an improvment.

> Yeah, we could no doubt add some functionality to disable all the
> row-splitting and column-splitting and associated escaping logic,
> but that's going to make COPY slower and more complicated.  And it
> still doesn't address wanting to use the result directly in a query
> instead of sticking it into a table.

The way that's handled for the cases that COPY does work with today is
file_fdw.  Ideally, we'd do the same here.

Ultimately, COPY absolutely *is* our general data import tool- it's just
that today we push some of the work to make things 'fit' on the user and
that ends up with pain points like exactly what Magnus has pointed out
here.  We should be looking to improve that situation, and I don't
really care for the solution to that being "create some random other new
thing for data import that users then have to know exists and learn how
to use".

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Domagoj Smoljanovic
Date:
Subject: RE: pg_restore causing deadlocks on partitioned tables
Next
From: Daniel Gustafsson
Date:
Subject: Re: pg_dump --where option