Re: COPY-able csv log outputs - Mailing list pgsql-patches

From Andrew Dunstan
Subject Re: COPY-able csv log outputs
Date
Msg-id 4667078B.6020705@dunslane.net
Whole thread Raw
In response to Re: COPY-able csv log outputs  (Greg Smith <gsmith@gregsmith.com>)
List pgsql-patches

Greg Smith wrote:
> The attached patch fixes all the issues I found in the original
> version of this code and completes the review I wanted to do.  Someone
> else will need to take this from here.  As I already mentioned, I
> can't comment on the quality of the piping implementation used to add
> this feature other than to say it worked for me.

The code below strikes me as being seriously broken.

The comment says that it doubles all single quotes, double quotes and
backslashes. It doesn't (backslashes are not doubled) and in any case
doing that would simply be wrong (with or without backslashes). The ONLY
things that should be doubled are double quotes. Period.

That part is easy to fix.

But here's my question: why are we worrying at all about things like the
encoding? Especially, why are we worrying about the *client* encoding
for a server log file? We surely aren't going to switch encodings in the
middle of a log file! ISTM that this routine should simply copy the
input, byte for byte, apart from doubling the dquotes. Does that make
sense? I assume that this routine has been copied from somewhere else
without sufficient regard to the context (or the logic).

The code also illustrates something else that annoyed me elsewhere in
the patch and that I have eliminated, namely use of a macro placed in
the header file and then used in exactly one place in the code. The
macro didn't make anything clearer - quite the reverse. ISTM that a
macro used only in one file should be defined in that file, generally,
and if it's used only once is probably not much use anyway.

cheers

andrew


> +
> + /*
> +  * Escapes special characters in the string to conform
> +  * with the csv type output.
> +  * Replaces " with "", ' with '' and \ with \\.
> +  */
> + static size_t
> + escape_string_literal(char *to, const char *from)
> + {
> +     const char *source = from;
> +     char       *target = to;
> +     size_t        remaining = 0;
> +     int    client_encoding = 0;
> +
> +     if (from == NULL)
> +         return remaining;
> +
> +     remaining = strlen(from);
> +     client_encoding = pg_get_client_encoding();
> +
> +     while (remaining > 0 && *source != '\0')
> +     {
> +         char        c = *source;
> +         int            len;
> +         int            i;
> +
> +         /* Fast path for plain ASCII */
> +         if (!IS_HIGHBIT_SET(c))
> +         {
> +             /* Apply quoting if needed */
> +             if (CSV_STR_DOUBLE(c, false))
> +                 *target++ = c;
> +             /* Copy the character */
> +             *target++ = c;
> +             source++;
> +             remaining--;
> +             continue;
> +         }
> +
> +         /* Slow path for possible multibyte characters */
> +         len = pg_encoding_mblen(client_encoding, source);
> +
> +         /* Copy the character */
> +         for (i = 0; i < len; i++)
> +         {
> +             if (remaining == 0 || *source == '\0')
> +                 break;
> +             *target++ = *source++;
> +             remaining--;
> +         }
> +
> +         /*
> +          * If we hit premature end of string (ie, incomplete multibyte
> +          * character), try to pad out to the correct length with spaces.
> +          * We may not be able to pad completely, but we will always be
> +          * able to insert at least one pad space (since we'd not have
> +          * quoted a multibyte character). This should be enough to make
> +          * a string that the server will error out on.
> +          */
> +         if (i < len)
> +         {
> +             for (; i < len; i++)
> +             {
> +                 if (((size_t) (target - to)) / 2 >= strlen(from))
> +                     break;
> +                 *target++ = ' ';
> +             }
> +             break;
> +         }
> +     }
> +
> +     /* Write the terminating NUL character. */
> +     *target = '\0';
> +
> +     return target - to;
> + }
>
>

pgsql-patches by date:

Previous
From: Hannes Eder
Date:
Subject: Re: [HACKERS] msvc, build and install with cygwin in the PATH
Next
From: Andrew Dunstan
Date:
Subject: WIP csv logs