Thread: [GENERAL] Removing null bytes from a json column

[GENERAL] Removing null bytes from a json column

From
Timothy Garnett
Date:
Does anyone have some tips on how to deal with an existing json type column that has some null bytes ( \u0000) in it? It seems like anything I do that touches any row with a null byte just errors.  I'd love to just remove them if I could find some way to find them, but I'm having trouble even figuring out how to do that.

This error I get is always:

PG::UntranslatableCharacter: ERROR:  unsupported Unicode escape sequence
DETAIL:  \u0000 cannot be converted to text.
CONTEXT:  JSON data, line 1: 
...st_name":"efxkerbs","company":"efxkerbs","email":...

Convert the column to jsonb -> errors.

Trying to find one of the offending rows:
select id from xxx where command->'args'->2->>'company' = 'efxkerbs';
 - same error on \u0000
select id from issued_crm_commands where command->'args'->2->'company' = '"efxkerbs"':json;
 - no equality operator between json
select id from issued_crm_commands where command->'args'->2->'company'::bytea = 'efxkerbs'::bytea;
 - no conversion from json to bytea

Any ideas on how to find rows with a \u0000 in the json?

Tim

Re: [GENERAL] Removing null bytes from a json column

From
Tom Lane
Date:
Timothy Garnett <tgarnett@panjiva.com> writes:
> Does anyone have some tips on how to deal with an existing json type column
> that has some null bytes ( \u0000) in it? It seems like anything I do that
> touches any row with a null byte just errors.  I'd love to just remove them
> if I could find some way to find them, but I'm having trouble even figuring
> out how to do that.

Doesn't it work to cast to text and do a LIKE or regex search?

regression=# select '{"z":"\u0000"}'::json::text ~ '\\u0000';
 ?column?
----------
 t
(1 row)

regression=# select '{"z":"\u0001"}'::json::text ~ '\\u0000';
 ?column?
----------
 f
(1 row)

It's true that we won't let you cast such a value to JSONB or do any
exciting JSON-ish manipulations on it, but I'm not seeing an error
in cast-to-text.

            regards, tom lane