On 2024-12-16 Mo 10:09 AM, Joel Jacobson wrote:
> Hi hackers,
>
> After further consideration, I'm withdrawing the patch.
> Some fundamental questions remain unresolved:
>
> - Should round-trip fidelity be a strict goal? By "round-trip fidelity",
> I mean that data exported and then re-imported should yield exactly
> the original values, including the distinction between NULL and empty strings.
> - If round-trip fidelity is a requirement, how do we distinguish NULL from empty
> strings without delimiters or escapes?
> - Is automatic newline detection (as in "csv" and "text") more valuable than
> the ability to embed \r (CR) characters?
> - Would it be better to extend the existing COPY options rather than introducing
> a new format?
> - Or should we consider a JSONL format instead, one that avoids the NULL/empty
> string problem entirely?
>
> No clear solution or consensus has emerged. For now, I'll step back from the
> proposal. If someone wants to revisit this later, I'd be happy to contribute.
>
> Thanks again for all the feedback and consideration.
>
We seem to have got seriously into the weeds, here. I'd be sorry to see
this dropped. After all, it's not something new, and while we have a
sort of workaround for "one json doc per line" it's far from obvious,
and except in a few blog posts undocumented.
I think we're trying to be far too general here but in the absence of
more general use cases. The ones I recall having encountered in the wild
are:
. one json datum per line
. one json document per file
. a sequence of json documents per file
The last one is hard to deal with, and I think I've only seen it once or
twice, so I suggest leaving it aside for now.
Notice these are all JSON. I could imagine XML might have similar
requirements, but I encounter it extremely rarely.
Regarding NULL, an empty string is not a valid JSON literal, so there
should be no confusion there. It is valid for XML, though.
Given all that I think restricting ourselves to just the JSON cases, and
possibly just to JSONL, would be perfectly reasonable.
Regarding CR, it's not a valid character in a JSON string item, although
it is valid in JSON whitespace. I would not treat it as magical unless
it immediately precedes an NL. That gives rise to a very sight
ambiguity, but I think it's one we could live with.
As for what the format is called, I don't like the "LIST" proposal much,
even for the general case. Seems too close to an array.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com