Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers
From | Sutou Kouhei |
---|---|
Subject | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date | |
Msg-id | 20240222.183948.518018047578925034.kou@clear-code.com Whole thread Raw |
In response to | Re: Make COPY format extendable: Extract COPY TO format implementations (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: Make COPY format extendable: Extract COPY TO format implementations
Re: Make COPY format extendable: Extract COPY TO format implementations |
List | pgsql-hackers |
Hi, In <ZdbtQJ-p5H1_EDwE@paquier.xyz> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 22 Feb 2024 15:44:16 +0900, Michael Paquier <michael@paquier.xyz> wrote: > I was comparing what you have here, and what's been attached by Andres > at [1] and the top of the changes on my development branch at [2] > (v3-0008, mostly). And, it strikes me that there is no need to do any > major changes in any of the callbacks proposed up to v13 and v14 in > this thread, as all the changes proposed want to plug in more data > into each StateData for COPY FROM and COPY TO, the best part being > that v3-0008 can just reuse the proposed callbacks as-is. v1-0001 > from Sutou-san would need one slight tweak in the per-row callback, > still that's minor. I think so too. But I thought that some minor conflicts will be happen with this and the v15. So I worked on this before the v15. We agreed that this optimization doesn't block v15: [1] So we can work on the v15 without this optimization for now. [1] https://www.postgresql.org/message-id/flat/20240219195351.5vy7cdl3wxia66kg%40awork3.anarazel.de#20f9677e074fb0f8c5bb3994ef059a15 > I have been spending more time on the patch to introduce the COPY > APIs, leading me to the v15 attached, where I have replaced the > previous attribute callbacks for the output representation and the > reads with hardcoded routines that should be optimized by compilers, > and I have done more profiling with -O2. Thanks! I wanted to work on it but I didn't have enough time for it in a few days... I've reviewed the v15. ---- > @@ -751,8 +751,9 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes) > * > * NOTE: force_not_null option are not applied to the returned fields. > */ > -bool > -NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) > +static bool "inline" is missing here. > +NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, > + bool is_csv) > { > int fldct; ---- How about adding "is_csv" to CopyReadline() and CopyReadLineText() too? ---- diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 25b8d4bc52..79fabecc69 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -150,8 +150,8 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0"; /* non-export function prototypes */ -static bool CopyReadLine(CopyFromState cstate); -static bool CopyReadLineText(CopyFromState cstate); +static inline bool CopyReadLine(CopyFromState cstate, bool is_csv); +static inline bool CopyReadLineText(CopyFromState cstate, bool is_csv); static inline int CopyReadAttributesText(CopyFromState cstate); static inline int CopyReadAttributesCSV(CopyFromState cstate); static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo, @@ -770,7 +770,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, tupDesc = RelationGetDescr(cstate->rel); cstate->cur_lineno++; - done = CopyReadLine(cstate); + done = CopyReadLine(cstate, is_csv); if (cstate->opts.header_line == COPY_HEADER_MATCH) { @@ -823,7 +823,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, cstate->cur_lineno++; /* Actually read the line into memory here */ - done = CopyReadLine(cstate); + done = CopyReadLine(cstate, is_csv); /* * EOF at start of line means we're done. If we see EOF after some @@ -1133,8 +1133,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, * by newline. The terminating newline or EOF marker is not included * in the final value of line_buf. */ -static bool -CopyReadLine(CopyFromState cstate) +static inline bool +CopyReadLine(CopyFromState cstate, bool is_csv) { bool result; @@ -1142,7 +1142,7 @@ CopyReadLine(CopyFromState cstate) cstate->line_buf_valid = false; /* Parse data and transfer into line_buf */ - result = CopyReadLineText(cstate); + result = CopyReadLineText(cstate, is_csv); if (result) { @@ -1209,8 +1209,8 @@ CopyReadLine(CopyFromState cstate) /* * CopyReadLineText - inner loop of CopyReadLine for text mode */ -static bool -CopyReadLineText(CopyFromState cstate) +static inline bool +CopyReadLineText(CopyFromState cstate, bool is_csv) { char *copy_input_buf; int input_buf_ptr; @@ -1226,7 +1226,7 @@ CopyReadLineText(CopyFromState cstate) char quotec = '\0'; char escapec = '\0'; - if (cstate->opts.csv_mode) + if (is_csv) { quotec = cstate->opts.quote[0]; escapec = cstate->opts.escape[0]; @@ -1306,7 +1306,7 @@ CopyReadLineText(CopyFromState cstate) prev_raw_ptr = input_buf_ptr; c = copy_input_buf[input_buf_ptr++]; - if (cstate->opts.csv_mode) + if (is_csv) { /* * If character is '\\' or '\r', we may need to look ahead below. @@ -1345,7 +1345,7 @@ CopyReadLineText(CopyFromState cstate) } /* Process \r */ - if (c == '\r' && (!cstate->opts.csv_mode || !in_quote)) + if (c == '\r' && (!is_csv || !in_quote)) { /* Check for \r\n on first line, _and_ handle \r\n. */ if (cstate->eol_type == EOL_UNKNOWN || @@ -1373,10 +1373,10 @@ CopyReadLineText(CopyFromState cstate) if (cstate->eol_type == EOL_CRNL) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - !cstate->opts.csv_mode ? + !is_csv ? errmsg("literal carriage return found in data") : errmsg("unquoted carriage return found in data"), - !cstate->opts.csv_mode ? + !is_csv ? errhint("Use \"\\r\" to represent carriage return.") : errhint("Use quoted CSV field to represent carriage return."))); @@ -1390,10 +1390,10 @@ CopyReadLineText(CopyFromState cstate) else if (cstate->eol_type == EOL_NL) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - !cstate->opts.csv_mode ? + !is_csv ? errmsg("literal carriage return found in data") : errmsg("unquoted carriage return found in data"), - !cstate->opts.csv_mode ? + !is_csv ? errhint("Use \"\\r\" to represent carriage return.") : errhint("Use quoted CSV field to represent carriage return."))); /* If reach here, we have found the line terminator */ @@ -1401,15 +1401,15 @@ CopyReadLineText(CopyFromState cstate) } /* Process \n */ - if (c == '\n' && (!cstate->opts.csv_mode || !in_quote)) + if (c == '\n' && (!is_csv || !in_quote)) { if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - !cstate->opts.csv_mode ? + !is_csv ? errmsg("literal newline found in data") : errmsg("unquoted newline found in data"), - !cstate->opts.csv_mode ? + !is_csv ? errhint("Use \"\\n\" to represent newline.") : errhint("Use quoted CSV field to represent newline."))); cstate->eol_type = EOL_NL; /* in case not set yet */ @@ -1421,7 +1421,7 @@ CopyReadLineText(CopyFromState cstate) * In CSV mode, we only recognize \. alone on a line. This is because * \. is a valid CSV data value. */ - if (c == '\\' && (!cstate->opts.csv_mode || first_char_in_line)) + if (c == '\\' && (!is_csv || first_char_in_line)) { char c2; @@ -1454,7 +1454,7 @@ CopyReadLineText(CopyFromState cstate) if (c2 == '\n') { - if (!cstate->opts.csv_mode) + if (!is_csv) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), errmsg("end-of-copy marker does not match previous newline style"))); @@ -1463,7 +1463,7 @@ CopyReadLineText(CopyFromState cstate) } else if (c2 != '\r') { - if (!cstate->opts.csv_mode) + if (!is_csv) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), errmsg("end-of-copy marker corrupt"))); @@ -1479,7 +1479,7 @@ CopyReadLineText(CopyFromState cstate) if (c2 != '\r' && c2 != '\n') { - if (!cstate->opts.csv_mode) + if (!is_csv) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), errmsg("end-of-copy marker corrupt"))); @@ -1508,7 +1508,7 @@ CopyReadLineText(CopyFromState cstate) result = true; /* report EOF */ break; } - else if (!cstate->opts.csv_mode) + else if (!is_csv) { /* * If we are here, it means we found a backslash followed by ---- > In this case, I have been able to limit the effects of the per-row > callback by making NextCopyFromRawFields() local to copyfromparse.c > while applying some inlining to it. This brings me to a different > point, why don't we do this change independently on HEAD? Does this mean that changing bool NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) to (adding "static") static bool NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) not (adding "static" and "bool is_csv") static bool NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool is_csv) improves performance? If so, adding the change independently on HEAD makes sense. But I don't know why that improves performance... Inlining? > It's not > really complicated to make NextCopyFromRawFields show high in the > profiles. I was looking at external projects, and noticed that > there's nothing calling NextCopyFromRawFields() directly. It means that we can hide NextCopyFromRawFields() without breaking compatibility (because nobody uses it), right? If so, I also think that we can change NextCopyFromRawFields() directly. If we assume that someone (not public code) may use it, we can create a new internal function and use it something like: ---- diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 7cacd0b752..b1515ead82 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -751,8 +751,8 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes) * * NOTE: force_not_null option are not applied to the returned fields. */ -bool -NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) +static bool +NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields) { int fldct; bool done; @@ -840,6 +840,12 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) return true; } +bool +NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) +{ + return NextCopyFromRawFieldsInternal(cstate, fields, nfields); +} + /* * Read next tuple from file for COPY FROM. Return false if no more tuples. * ---- Thanks, -- kou
pgsql-hackers by date: