Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY - Mailing list pgsql-hackers

From Tom Lane
Subject Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY
Date
Msg-id 23999.1352921872@sss.pgh.pa.us
Whole thread Raw
In response to Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
Andrew Dunstan <andrew@dunslane.net> writes:
> On 11/14/2012 02:05 PM, Peter Eisentraut wrote:
>> Why don't you filter the data before it gets to stdin?  Some program is
>> feeding the data to "stdin" on the client side.  Why doesn't that do the
>> filtering?  I don't see a large advantage in having the data be sent
>> unfiltered to the server and having the server do the filtering.

> Centralization of processing would be one obvious reason.

If I understand correctly, what you're imagining is that the client
sources data to a COPY FROM STDIN type of command, then the backend
pipes that out to stdin of some filtering program, which it then reads
the stdout of to get the data it processes and stores.

We could in principle make that work, but there are some pretty serious
implementation problems: popen doesn't do this so we'd have to cons up
our own fork and pipe setup code, and we would have to write a bunch of
asynchronous processing logic to account for the possibility that the
filter program doesn't return data in similar-size chunk to what it
reads.  (IOW, it will never be clear when to try to read data from the
filter and when to try to write data to it.)

I think it's way too complicated for the amount of functionality you'd
get.  As Peter says, there's no strong reason not to do such processing
on the client side.  In fact there are pretty strong reasons to prefer
to do it there, like not needing database superuser privilege to invoke
the filter program.

What I'm imagining is a very very simple addition to COPY that just
allows it to execute popen() instead of fopen() to read or write the
data source/sink.  What you suggest would require hundreds of lines and
create many opportunities for new bugs.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY
Next
From: Tom Lane
Date:
Subject: Re: Further pg_upgrade analysis for many tables