Re: COPY-able csv log outputs - Mailing list pgsql-patches
From | Andrew Dunstan |
---|---|
Subject | Re: COPY-able csv log outputs |
Date | |
Msg-id | 4667078B.6020705@dunslane.net Whole thread Raw |
In response to | Re: COPY-able csv log outputs (Greg Smith <gsmith@gregsmith.com>) |
List | pgsql-patches |
Greg Smith wrote: > The attached patch fixes all the issues I found in the original > version of this code and completes the review I wanted to do. Someone > else will need to take this from here. As I already mentioned, I > can't comment on the quality of the piping implementation used to add > this feature other than to say it worked for me. The code below strikes me as being seriously broken. The comment says that it doubles all single quotes, double quotes and backslashes. It doesn't (backslashes are not doubled) and in any case doing that would simply be wrong (with or without backslashes). The ONLY things that should be doubled are double quotes. Period. That part is easy to fix. But here's my question: why are we worrying at all about things like the encoding? Especially, why are we worrying about the *client* encoding for a server log file? We surely aren't going to switch encodings in the middle of a log file! ISTM that this routine should simply copy the input, byte for byte, apart from doubling the dquotes. Does that make sense? I assume that this routine has been copied from somewhere else without sufficient regard to the context (or the logic). The code also illustrates something else that annoyed me elsewhere in the patch and that I have eliminated, namely use of a macro placed in the header file and then used in exactly one place in the code. The macro didn't make anything clearer - quite the reverse. ISTM that a macro used only in one file should be defined in that file, generally, and if it's used only once is probably not much use anyway. cheers andrew > + > + /* > + * Escapes special characters in the string to conform > + * with the csv type output. > + * Replaces " with "", ' with '' and \ with \\. > + */ > + static size_t > + escape_string_literal(char *to, const char *from) > + { > + const char *source = from; > + char *target = to; > + size_t remaining = 0; > + int client_encoding = 0; > + > + if (from == NULL) > + return remaining; > + > + remaining = strlen(from); > + client_encoding = pg_get_client_encoding(); > + > + while (remaining > 0 && *source != '\0') > + { > + char c = *source; > + int len; > + int i; > + > + /* Fast path for plain ASCII */ > + if (!IS_HIGHBIT_SET(c)) > + { > + /* Apply quoting if needed */ > + if (CSV_STR_DOUBLE(c, false)) > + *target++ = c; > + /* Copy the character */ > + *target++ = c; > + source++; > + remaining--; > + continue; > + } > + > + /* Slow path for possible multibyte characters */ > + len = pg_encoding_mblen(client_encoding, source); > + > + /* Copy the character */ > + for (i = 0; i < len; i++) > + { > + if (remaining == 0 || *source == '\0') > + break; > + *target++ = *source++; > + remaining--; > + } > + > + /* > + * If we hit premature end of string (ie, incomplete multibyte > + * character), try to pad out to the correct length with spaces. > + * We may not be able to pad completely, but we will always be > + * able to insert at least one pad space (since we'd not have > + * quoted a multibyte character). This should be enough to make > + * a string that the server will error out on. > + */ > + if (i < len) > + { > + for (; i < len; i++) > + { > + if (((size_t) (target - to)) / 2 >= strlen(from)) > + break; > + *target++ = ' '; > + } > + break; > + } > + } > + > + /* Write the terminating NUL character. */ > + *target = '\0'; > + > + return target - to; > + } > >
pgsql-patches by date: