Re: backslash-dot quoting in COPY CSV - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: backslash-dot quoting in COPY CSV
Date
Msg-id 20190125030930.GC22001@momjian.us
Whole thread Raw
In response to backslash-dot quoting in COPY CSV  ("Daniel Verite" <daniel@manitou-mail.org>)
Responses Re: backslash-dot quoting in COPY CSV  ("Daniel Verite" <daniel@manitou-mail.org>)
Re: backslash-dot quoting in COPY CSV  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Wed, Jan  2, 2019 at 04:58:35PM +0100, Daniel Verite wrote:
>  Hi,
> 
> The doc on COPY CSV says about the backslash-dot sequence:
> 
>   To avoid any misinterpretation, a \. data value appearing as a
>   lone entry on a line is automatically quoted on output, and on
>   input, if quoted, is not interpreted as the end-of-data marker
> 
> However this quoting does not happen when \. is already part
> of a quoted field. Example:
> 
> COPY (select 'somevalue', E'foo\n\\.\nbar') TO STDOUT CSV;
> 
> outputs:
> 
> somevalue,"foo
> \.
> bar"
> 
> which conforms to the CSV rules, by which we are not allowed
> to replace \. by anything AFAICS.
> The trouble is, when trying to import this back with COPY FROM,
> it will error out at the backslash-dot and not import anything.
> Furthermore, if these data are meant to be embedded into a
> script, it creates a potential risk of SQL injection.
> 
> It is a known issue? I haven't found previous discussions on this.
> It looks to me like the ability of backslash-dot to be an end-of-data
> marker should be neutralizable for CSV. When the data is not embedded,
> it's not needed anyway, and when it's embedded, we could surely think
> of alternatives.

I was unable to reproduce the failure here using files:

    CREATE TABLE test (x TEXT);
    INSERT INTO test VALUES (E'foo\n\\.\nbar');

    COPY test TO STDOUT CSV;
    "foo
    \.
    bar"

    COPY test TO '/u/postgres/tmp/x' CSV;
    
    COPY test FROM '/u/postgres/tmp/x' CSV;

    SELECT * FROM test;
      x
    -----
     foo+
     \. +
     bar
     foo+
     \. +
     bar

but I am able to see the failure using STDIN:

    COPY test FROM STDIN CSV;
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself, or an EOF signal.
    "foo
    \.
    ERROR:  unterminated CSV quoted field
    CONTEXT:  COPY test, line 1: ""foo

This seems like a bug to me.  Looking at the code, psql issues the
prompts for STDIN, but when it sees \. alone on a line, it has no idea
you are in a quoted CSV string, so it thinks the copy is done and sends
the result to the server.  I can't see an easy way to fix this.  I guess
we could document it.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: WIP: Avoid creation of the free space map for small tables
Next
From: Chapman Flack
Date:
Subject: Re: PostgreSQL vs SQL/XML Standards