Re: copying json data and backslashes - Mailing list pgsql-general

From Erik Wienhold
Subject Re: copying json data and backslashes
Date
Msg-id 1342743333.378397.1669130157980@office.mailbox.org
Whole thread Raw
In response to copying json data and backslashes  (Alastair McKinley <a.mckinley@analyticsengines.com>)
Responses Re: copying json data and backslashes  ("pbj@cmicdo.com" <pbj@cmicdo.com>)
List pgsql-general
> On 22/11/2022 15:23 CET Alastair McKinley <a.mckinley@analyticsengines.com> wrote:
>
> Hi all,
>
> I have come across this apparently common issue COPY-ing json and wondering if
> there is potentially a better solution.
>
> I am copying data into a jsonb column originating from a 3rd party API. The
> data may have literal \r,\t,\n and also double backslashes.
>
> I discovered that I can cast this data to a jsonb value directly but I can't
> COPY the data without pre-processing.
>
> The example below illustrates my issue (only with \r, but the problem extends
> to other \X combinations).
>
> > do $$
> > lines=[r'{"test" : "\r this data has a carriage return"}']
> >
> > with open("/tmp/test1.json","w") as f:
> > for line in lines:
> > f.write(line.strip() + "\n")
> >
> > $$ language plpython3u;
> >
> > create temp table testing (data jsonb);
> >
> > -- this works
> > insert into testing (data)
> > select l::jsonb
> > from pg_read_file('/tmp/test1.json') f,
> > lateral regexp_split_to_table(f,'\n') l where l <> '';
> >
> > -- fails
> > copy testing (data) from '/tmp/test1.json';
> >
> > -- works
> > copy testing (data) from program $c$ sed -e 's/\\r/\\\\u000a/g' /tmp/test1.json $c$;
> >
>
> Is there any other solution with COPY that doesn't require manual
> implementation of search/replace to handle these edge cases?
> Why does ::jsonb work but COPY doesn't? It seems a bit inconsistent.

COPY handles special backslash sequences[1].  The \r in your sample JSON,
although properly escaped according to JSON, is replaced with an actual
carriage return by COPY before casting to jsonb.  The error results from JSON
prohibiting unescaped control characters in strings[2].

You must double escape to pass those characters through COPY.

See how COPY outputs backslash sequences:

    -- Actual carriage return:
    copy (select e'\r') to stdout;
    \r

    -- Backslash sequence for carriage return:
    copy (select '\r') to stdout;
    \\r

[1] https://www.postgresql.org/docs/current/sql-copy.html#id-1.9.3.55.9.2
[2] https://www.json.org/json-en.html

--
Erik



pgsql-general by date:

Previous
From: Alastair McKinley
Date:
Subject: copying json data and backslashes
Next
From: "pbj@cmicdo.com"
Date:
Subject: Re: copying json data and backslashes