Re: backslash-dot quoting in COPY CSV - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: backslash-dot quoting in COPY CSV
Date
Msg-id 20190130175059.GA26141@momjian.us
Whole thread Raw
In response to Re: backslash-dot quoting in COPY CSV  ("Daniel Verite" <daniel@manitou-mail.org>)
Responses Re: backslash-dot quoting in COPY CSV  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-hackers
On Wed, Jan 30, 2019 at 06:32:11PM +0100, Daniel Verite wrote:
>     Bruce Momjian wrote:
> > Well, these all kind of require a change to the COPY format, which
> > hasn't changed in many years.
> 
> Not for the first two. As an example of solution #2, it could look like this:
> 
> =# \set INLINE_COPY_BOUNDARY ==JuQW3gc2mQjXuvmJ32TlOLhJ3F2Eh2LcsBup0oH7==
> =# COPY table FROM STDIN CSV;
> somevalue,"foo
> \.
> bar"
> ==JuQW3gc2mQjXuvmJ32TlOLhJ3F2Eh2LcsBup0oH7==
> 
> Instead of looking for \. on a line by itself, psql would look for the
> boundary to know where the data ends.
> The boundary is not transmitted to the server, it has no need to know
> about it.

Wow, that is an odd API, as you stated below.

> > > - psql could be told somehow that the next piece of inline data is in
> > > the CSV format, and then pass it through a CSV parser.
> > 
> > That might be the cleanest solution, but how would we actually input
> > multi-line data in CSV mode with \. alone on a line?
> 
> With this solution, the content doesn't change at all.
> The weird part would be the user interface, because the information
> psql needs is not only "CSV", it's also the options DELIMITER, QUOTE,
> ESCAPE and possibly ENCODING. Currently it doesn't know any of these,
> they're passed to the server in an opaque, unparsed form within
> the COPY command.
> 
> Personally, the solution I find cleaner is the server side not having
> any end-of-data marker for CSV. So backslash-dot would never be
> special. psql could allow for a custom ending boundary for in-script
> data, and users could set that to backslash-dot if they want, but that
> would be their choice.
> That would be clearly not backward compatible, and I believe it wouldn't
> work with the v2 protocol, so I'm not sure it would have much chance of
> approval.

I had forgotten that the DELIMITER and QUOTE can be changed --- that
kills the idea of adding a simple CSV parser into psql because we would
have to parse the COPY SQL command as well.

I am wondering if we should just disallow CSV from STDIN, on security
grounds.  How big a problem would that be for people?  Would we have to
disable to STDOUT as well since it could not be restored?  Should we
issue some kind of security warning in such cases?  Should we document
this?

In hindsight, I am not sure how we could have designed this more
securly.  I guess we could have required some special text to start all
CSV continuation lines that were not end-of-file, but that would have
been very unportable, which is the goal of CSV.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


pgsql-hackers by date:

Previous
From: "Bossart, Nathan"
Date:
Subject: Re: A few new options for vacuumdb
Next
From: Pavel Stehule
Date:
Subject: Re: backslash-dot quoting in COPY CSV