Re: Review : Add hooks for pre- and post-processor executables for COPY and \copy - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Review : Add hooks for pre- and post-processor executables for COPY and \copy
Date
Msg-id 512D08DD.8050109@vmware.com
Whole thread Raw
In response to Re: Review : Add hooks for pre- and post-processor executables for COPY and \copy  ("Etsuro Fujita" <fujita.etsuro@lab.ntt.co.jp>)
Responses Re: Review : Add hooks for pre- and post-processor executables for COPY and \copy  (Amit Kapila <amit.kapila@huawei.com>)
List pgsql-hackers
On 22.02.2013 12:43, Etsuro Fujita wrote:
>>>>>> 1. "Broken pipe" is not handled in case of psql "\copy" command;
>>>>>>      Issue are as follows:
>>>>>>          Following are verified on SuSE-Linux 10.2.
>>>>>>          1) psql is exiting when "\COPY xxx TO" command is issued
>>>>>> and
>>>> command/script is not found
>>>>>>                   When popen is called in write mode it is creating
>>>>>> valid
>>>> file descriptor and when it tries to write to file "Broken pipe"
>>> error
>>>> is>  coming which is not handled.
>>>>>>                          psql# \copy pgbench_accounts TO PROGRAM
>>>> '../compress.sh pgbench_accounts4.txt'
>>>>>>          2) When "\copy" command is in progress then
>>> program/command
>>>>>> is
>>>> killed/"crashed due to any problem"
>>>>>>             psql is exiting.
>>>>
>>>>> This is a headache.  I have no idea how to solve this.
>>>>
>>>> I think we can keep it for committer to take a call on this issue.
>>>
>>> Agreed.

Fortunately this one is easy. We just need to ignore SIGPIPE, by calling
pqsignal(SIGPIPE, SIG_IGN) before popen(). We already do the same in
psql when the query output is piped to a pager, in PageOutput.

>>>> 5. Following in copy.sgml can be changed to make more meaningful as
>>>> the first line looks little adhoc.
>>>> +<para>
>>>> +      The command that input comes from or that output goes to.
>>>> +      The command for COPY FROM, which input comes from, must write
>>>> + its
>>>> output
>>>> +      to standard output.  The command for COPY TO, which output
>>> goes
>>>> + to,
>>>> must
>>>> +      read its input from standard input.
>>>> +</para>
>>>
>>> I've struggled to make the document more meaningful.
>>
>> To be honest, I am not sure whether introducing pre, post processor
>> terminology is right or not,
>> But again I shall let committer decide about this point.
>
> Agreed.

I changed the terminology to use terms like "the command specified with
PROGRAM", instead of talking about pre- and post-processsors.

>> I have one small another doubt that in function parse_slash_copy, you
>> avoided expand tilde
>> for program case, which I am not sure is the right thing or not.
>
> Sorry, I'm not sure that, too.  I'd like to leave this for committers.

That seems correct. The shell will do tilde expansion with popen(), we
don't need to do it ourselves.

There's one oddity in the psql \copy syntax. This is actually present in
earlier versions too, but I think we should fix it now that we extend
the syntax:

  -- Writes to standard output. There's no way to write to a file called
"stdout".
  \copy foo to 'stdout'

I think we should change the interpretation of the above so that it
writes to a file called "stdout". It's possible that there's an
application out there that relies on that to write to stdout, but it's
not hard to fix the application. A small note in the release notes might
be in order.

Also, I think we should require the command to be quoted in \copy. Ie.
let's forbid this:

\copy pgbench_accounts to program /bin/cat>/dev/null

and require that it's written as:

\copy pgbench_accounts to program '/bin/cat>/dev/null'

We don't require quoting for filenames, which I find a bit weird too,
but it seems particularly confusing for a shell command.

Attached is a new version of this patch that I almost committed, but one
thing caught my eye at the last minute: The error reporting is not very
user-friendly. If the program fails after reading/writing some rows, the
reason is printed to the log, but not the user:

postgres=# copy foo to program '/tmp/midway-crash';
ERROR:  could not execute command "/tmp/midway-crash"

the log has more details:

LOG:  child process exited with exit code 10
STATEMENT:  copy foo to program '/tmp/midway-crash';
ERROR:  could not execute command "/tmp/midway-crash"
STATEMENT:  copy foo to program '/tmp/midway-crash';

I think we should arrange for the "child process exited with exit code
10" to be printed as errdetail to the client. Similarly, with psql \copy
command, the "child process exited with exit code 10" command shouldn't
be printed straight to stderr, but should go through psql_error.

I'll try to refactor pclose_check tomorrow to do that. Meanwhile, please
take a look at the attached if you have the time. I rewrote much of the
docs changes, and some comments.

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_basebackup caused FailedAssertion
Next
From: Tom Lane
Date:
Subject: Re: bugfix: --echo-hidden is not supported by \sf statements