Re: [PATCH] Reject ENCODING option for COPY TO FORMAT JSON - Mailing list pgsql-hackers

From Ayush Tiwari
Subject Re: [PATCH] Reject ENCODING option for COPY TO FORMAT JSON
Date
Msg-id CAJTYsWU24n==-H_agoUguFxZjFV6jCzNamuJzj4ZgiFPUp2bRg@mail.gmail.com
Whole thread
In response to Re: [PATCH] Reject ENCODING option for COPY TO FORMAT JSON  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [PATCH] Reject ENCODING option for COPY TO FORMAT JSON
List pgsql-hackers
Hi,

On Mon, 20 Apr 2026 at 19:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ayush Tiwari <ayushtiwari.slg01@gmail.com> writes:
> COPY TO FORMAT JSON silently accepts the ENCODING option but doesn't
> perform encoding conversion(?)  CopyToJsonOneRow() sends the output of
> composite_to_json() via CopySendData() without calling
> pg_server_to_any(), unlike the text and CSV paths.

>   COPY t TO '/tmp/out.json' WITH (FORMAT json, ENCODING 'LATIN1');

> On a UTF-8 server this produces UTF-8 output, not LATIN1.

Seems to me the correct thing here is to make it work like the other
cases, ie perform pg_server_to_any().  I have exactly no sympathy for
the argument about the RFC saying it must be UTF-8, not least because
that's not in fact what is implemented (what if the server encoding
isn't UTF-8?).

Agreed. I initially thought rejecting the option was the safer route 
given the RFC, but as you pointed out, we aren't enforcing 
UTF-8 strictly on the server side anyway. 


Rejecting this option altogether doesn't improve anything, not
functionally, not specs-compliance-wise, nor according to the
principle of least surprise.
 
Makes sense. Implementing the conversion properly 
keeps JSON format consistent with how the text and CSV formats behave.


> The attached patch rejects the explicit ENCODING option for JSON
> mode, consistent with how DELIMITER, NULL, DEFAULT, and HEADER are
> already rejected.  The implicit client_encoding case is a separate
> design question (should COPY TO JSON always emit UTF-8 regardless
> of client_encoding?) that maybe we should address separately and not as
> part of v19.

No, you don't get to punt this till later.  Once we ship v19 there's
going to be a strong expectation of backwards compatibility.

The idea of sending UTF-8 to a client that's set client_encoding to
something else would be risible, if it weren't a security hazard.

I agree sending unconverted bytes to a mismatched 
client encoding is clearly a security hazard that needs addressing. Did
not consider the backward compatibility part, my bad.

Was trying out adding  pg_server_to_any() to the json_buf after 
composite_to_json() returns, 
correctly covering both explicit ENCODING option specifications and 
implicit client_encoding mismatches. 

Let me send a patch with code and associated test cases.

Regards,
Ayush

 

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: SQL:2011 Application Time Update & Delete
Next
From: Tom Lane
Date:
Subject: Re: Adding REPACK [concurrently]