On 2024-10-08 Tu 3:25 AM, Joel Jacobson wrote:
> On Sun, Oct 6, 2024, at 15:12, Andrew Dunstan wrote:
>> On 2024-10-04 Fr 12:19 PM, Joel Jacobson wrote:
>>> 2. Avoid needing hacks like using E'\x01' as quoting char.
>>>
>>> Introduce QUOTE NONE and DELIMITER NONE,
>>> to allow raw lines to be imported "as is" into a single text column.
>> As I think I previously indicated, I'm perfectly happy about 2, because
>> it replaces a far from obvious hack, but I am at best dubious about 1.
> I've looked at how to implement this, and there is quite a lot of complexity
> having to do with quoting and escaping.
>
> Need guidance on what you think would be best to do:
>
> 2a) Should we aim to support all NONE combinations, at the cost of increasing the
> complexity at all code having to do with quoting, escaping and delimiters?
>
> 2b) Should we aim to only support the QUOTE NONE DELIMITER NONE ESCAPE NONE case,
> useful to the real-life scenario we've identified, that is, importing raw log
> lines into a single column, which could then be handed by a much simpler and
> probably faster version of CopyReadAttributesCSV(),
> e.g. named CopyReadAttributesUnquotedUnDelimited() or
> maybe CopyReadAttributesRaw()?
> (We also need to modify CopyReadLineText(), but seems we only need a
> quote_none bool, to skip over the quoting code there, so don't think a
> separate function is warranted there.)
>
> I think ESCAPE NONE should be implied from QUOTE NONE, since the default escape
> character is the same as the quote character, so if there isn't any
> quote character, then I think that would imply no escape character either.
>
> Can we think of any other valid, useful, realistic, and safe combinations of
> QUOTE NONE, DELIMITER NONE and ESCAPE NONE, that would be interesting
> to support?
>
> If not, then I think 2b looks more interesting, to reduce risk of accidental
> misuse, simpler implementation, and since it also should allow importing
> raw log files faster, thanks to the reduced complexity.
>
Off hand I can't think of a case other than 2b that would apply in the
real world, although others might like to chime in here. If we're going
to do that, let's find a shorter way to spell it. In fact, we should do
that even if we go with 2a.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com