Thread: JSON_VALUE() behavior when RETURNING bytea (expected base64 decoding)

JSON_VALUE() behavior when RETURNING bytea (expected base64 decoding)

From
Shay Rojansky
Date:
Greetings hackers,

The de-facto standard for storing binary data in JSON documents seems to be base64-encoded strings, so I was expecting JSON_VALUE's RETURNING bytea to do base64 decoding. However, that does not seem to be the case:

SELECT decode('AQID', 'base64'); -- 0x010203
SELECT JSON_VALUE(jsonb '"AQID"', '$' RETURNING bytea); -- Expected 0x010203, got AQID

This forces an additional explicit decode() function call:

SELECT decode(JSON_VALUE(jsonb '"AQID"', '$'), 'base64'); -- 0x010203

Is the above behavior intentional?

Shay

Re: JSON_VALUE() behavior when RETURNING bytea (expected base64 decoding)

From
"David G. Johnston"
Date:
On Wednesday, March 5, 2025, Shay Rojansky <roji@roji.org> wrote:
SELECT JSON_VALUE(jsonb '"AQID"', '$' RETURNING bytea); -- Expected 0x010203, got AQID

I get \x41514944 which is precisely what I would expect since it what this query results in as well:

select 'AQID'::bytea;

David J.

Re: JSON_VALUE() behavior when RETURNING bytea (expected base64 decoding)

From
Greg Sabino Mullane
Date:
It looks like your bytea_output is set to 'escape', which would explain what you are seeing. Try adding this in first:

SET bytea_output = hex;
SELECT JSON_VALUE(jsonb '"AQID"', '$' RETURNING bytea);

That (hex) is the default value, so you must be setting it to escape somewhere. You can see where by running:

select * from pg_settings where name = 'bytea_output';

Examine the source, sourcefile, and sourceline columns 

I personally prefer hex, but perhaps it's set to octet in your system for a reason. If not, maybe change it globally?

Cheers,
Greg

--
Enterprise Postgres Software Products & Tech Support


SELECT JSON_VALUE(jsonb '"AQID"', '$' RETURNING bytea); -- Expected 0x010203, got AQID

I get \x41514944 which is precisely what I would expect since it what this query results in as well:

select 'AQID'::bytea;

If the behavior of RETURNING is meant to be identical to that of simply applying a cast, is there any actual advantage in using JSON_VALUE with RETURNING? In other words, why not just do JSON_VALUE(json '"AQID"', '$')::bytea instead of using RETURNING? I thought the point was precisely for RETURNING to be able to perform JSON-specific conversions (e.g. take into account that the base64 is being converted from a *JSON* string, and therefore apply base64 decoding to it).

Re: JSON_VALUE() behavior when RETURNING bytea (expected base64 decoding)

From
"David G. Johnston"
Date:


On Wednesday, March 5, 2025, Shay Rojansky <roji@roji.org> wrote:

SELECT JSON_VALUE(jsonb '"AQID"', '$' RETURNING bytea); -- Expected 0x010203, got AQID

I get \x41514944 which is precisely what I would expect since it what this query results in as well:

select 'AQID'::bytea;

If the behavior of RETURNING is meant to be identical to that of simply applying a cast, is there any actual advantage in using JSON_VALUE with RETURNING? In other words, why not just do JSON_VALUE(json '"AQID"', '$')::bytea instead of using RETURNING? I thought the point was precisely for RETURNING to be able to perform JSON-specific conversions (e.g. take into account that the base64 is being converted from a *JSON* string, and therefore apply base64 decoding to it).

Not really…it does seem to just be syntactic sugar.  Not that we’d be likely to assume the contents of a JSON string are a base64 encoding as it is just, as you claim, a de-facto standard.  Unless we have some standard (namely the one defining json_value) telling us that the contents are indeed always base64 encoded data we’ll just assume it’s plain text and act accordingly - in this case passing it into bytea’s input function.

David J.

If the behavior of RETURNING is meant to be identical to that of simply applying a cast, is there any actual advantage in using JSON_VALUE with RETURNING? In other words, why not just do JSON_VALUE(json '"AQID"', '$')::bytea instead of using RETURNING? I thought the point was precisely for RETURNING to be able to perform JSON-specific conversions (e.g. take into account that the base64 is being converted from a *JSON* string, and therefore apply base64 decoding to it).

Not really…it does seem to just be syntactic sugar.  Not that we’d be likely to assume the contents of a JSON string are a base64 encoding as it is just, as you claim, a de-facto standard.  Unless we have some standard (namely the one defining json_value) telling us that the contents are indeed always base64 encoded data we’ll just assume it’s plain text and act accordingly - in this case passing it into bytea’s input function.

OK. For whatever it's worth, I'll note that SQL Server's OPENJSON does do this (so when a JSON string property is extracted as a binary type, base64 encoding is assumed). Other databases also have very specific documented conversion rules for JSON_VALUE RETURNING (Oracle, DB2 (table 1)). I'm basically trying to show that RETURNING definitely isn't a simple cast-from-string in other databases, but is a distinct conversion mechanism that takes into account the fact the the origin data comes from JSON.

JSON is of course a very light on formal/official standards, but some very strong de-facto standards have established themselves (e.g. ISO8601 for timestamps), and even beyond JSON, base64 seems to be the de-facto standard for encoding binary data as a string (which is what this is about). I'll also point out again that if the user really is looking only to get a string out and apply regular PG convert-from-string casting, they can do just that (i.e. omit RETURNING and apply regular PG casting); to me that points to RETURNING doing something beyond that, adding JSON-specific usefulness.

Re: JSON_VALUE() behavior when RETURNING bytea (expected base64 decoding)

From
Peter Eisentraut
Date:
On 06.03.25 11:55, Shay Rojansky wrote:
> For whatever it's worth, I'll note that SQL Server's OPENJSON does do 
> this (so when a JSON string property is extracted as a binary type, 
> base64 encoding is assumed). Other databases also have very specific 
> documented conversion rules for JSON_VALUE RETURNING (Oracle <https:// 
> docs.oracle.com/en/database/oracle/oracle-database/19/adjsn/clauses- 
> used-in-functions-and-conditions-for-json.html#GUID- 
> DE9F29D3-1C23-4271-9DCD-E585866576D2>, DB2 <https://www.ibm.com/docs/en/ 
> i/7.3?topic=functions-json-table#rbafzscajsontable__json_result> (table 
> 1)). I'm basically trying to show that RETURNING definitely isn't a 
> simple cast-from-string in other databases, but is a distinct conversion 
> mechanism that takes into account the fact the the origin data comes 
> from JSON.

According to the SQL standard, once you account for various special 
cases (non-scalar values, null values), it comes down to a cast.




> For whatever it's worth, I'll note that SQL Server's OPENJSON does do
> this (so when a JSON string property is extracted as a binary type,
> base64 encoding is assumed). Other databases also have very specific
> documented conversion rules for JSON_VALUE RETURNING (Oracle <https://
> docs.oracle.com/en/database/oracle/oracle-database/19/adjsn/clauses-
> used-in-functions-and-conditions-for-json.html#GUID-
> DE9F29D3-1C23-4271-9DCD-E585866576D2>, DB2 <https://www.ibm.com/docs/en/
> i/7.3?topic=functions-json-table#rbafzscajsontable__json_result> (table
> 1)). I'm basically trying to show that RETURNING definitely isn't a
> simple cast-from-string in other databases, but is a distinct conversion
> mechanism that takes into account the fact the the origin data comes
> from JSON.

According to the SQL standard, once you account for various special
cases (non-scalar values, null values), it comes down to a cast.

OK. I don't have the SQL standard here, but I'll just note that this doesn't seem to be what most/all other databases are doing - there's maybe room for interpretation there (but again, I have no idea). Applying certain transformations where needed certainly seems like the more useful thing to do, like in this case.