Home > mailing lists
Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers

From	Sutou Kouhei
Subject	Re: Make COPY format extendable: Extract COPY TO format implementations
Date	February 22 12:39:48
Msg-id	20240222.183948.518018047578925034.kou@clear-code.com Whole thread Raw
In response to	Re: Make COPY format extendable: Extract COPY TO format implementations (Michael Paquier <michael@paquier.xyz>)
Responses	Re: Make COPY format extendable: Extract COPY TO format implementations Re: Make COPY format extendable: Extract COPY TO format implementations
List	pgsql-hackers
Tree view
Hi,

In <ZdbtQJ-p5H1_EDwE@paquier.xyz>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 22 Feb 2024 15:44:16 +0900,
  Michael Paquier <michael@paquier.xyz> wrote:

> I was comparing what you have here, and what's been attached by Andres
> at [1] and the top of the changes on my development branch at [2]
> (v3-0008, mostly).  And, it strikes me that there is no need to do any
> major changes in any of the callbacks proposed up to v13 and v14 in
> this thread, as all the changes proposed want to plug in more data
> into each StateData for COPY FROM and COPY TO, the best part being
> that v3-0008 can just reuse the proposed callbacks as-is.  v1-0001
> from Sutou-san would need one slight tweak in the per-row callback,
> still that's minor.

I think so too. But I thought that some minor conflicts will
be happen with this and the v15. So I worked on this before
the v15.

We agreed that this optimization doesn't block v15: [1]
So we can work on the v15 without this optimization for now.

[1]
https://www.postgresql.org/message-id/flat/20240219195351.5vy7cdl3wxia66kg%40awork3.anarazel.de#20f9677e074fb0f8c5bb3994ef059a15

> I have been spending more time on the patch to introduce the COPY
> APIs, leading me to the v15 attached, where I have replaced the
> previous attribute callbacks for the output representation and the
> reads with hardcoded routines that should be optimized by compilers,
> and I have done more profiling with -O2.

Thanks! I wanted to work on it but I didn't have enough time
for it in a few days...

I've reviewed the v15.

----
> @@ -751,8 +751,9 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
>   *
>   * NOTE: force_not_null option are not applied to the returned fields.
>   */
> -bool
> -NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
> +static bool

"inline" is missing here.

> +NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields,
> +                      bool is_csv)
>  {
>      int            fldct;
----

How about adding "is_csv" to CopyReadline() and
CopyReadLineText() too?

----
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 25b8d4bc52..79fabecc69 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -150,8 +150,8 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 
 /* non-export function prototypes */
-static bool CopyReadLine(CopyFromState cstate);
-static bool CopyReadLineText(CopyFromState cstate);
+static inline bool CopyReadLine(CopyFromState cstate, bool is_csv);
+static inline bool CopyReadLineText(CopyFromState cstate, bool is_csv);
 static inline int CopyReadAttributesText(CopyFromState cstate);
 static inline int CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
@@ -770,7 +770,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields,
         tupDesc = RelationGetDescr(cstate->rel);
 
         cstate->cur_lineno++;
-        done = CopyReadLine(cstate);
+        done = CopyReadLine(cstate, is_csv);
 
         if (cstate->opts.header_line == COPY_HEADER_MATCH)
         {
@@ -823,7 +823,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields,
     cstate->cur_lineno++;
 
     /* Actually read the line into memory here */
-    done = CopyReadLine(cstate);
+    done = CopyReadLine(cstate, is_csv);
 
     /*
      * EOF at start of line means we're done.  If we see EOF after some
@@ -1133,8 +1133,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
  * by newline.  The terminating newline or EOF marker is not included
  * in the final value of line_buf.
  */
-static bool
-CopyReadLine(CopyFromState cstate)
+static inline bool
+CopyReadLine(CopyFromState cstate, bool is_csv)
 {
     bool        result;
 
@@ -1142,7 +1142,7 @@ CopyReadLine(CopyFromState cstate)
     cstate->line_buf_valid = false;
 
     /* Parse data and transfer into line_buf */
-    result = CopyReadLineText(cstate);
+    result = CopyReadLineText(cstate, is_csv);
 
     if (result)
     {
@@ -1209,8 +1209,8 @@ CopyReadLine(CopyFromState cstate)
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
-static bool
-CopyReadLineText(CopyFromState cstate)
+static inline bool
+CopyReadLineText(CopyFromState cstate, bool is_csv)
 {
     char       *copy_input_buf;
     int            input_buf_ptr;
@@ -1226,7 +1226,7 @@ CopyReadLineText(CopyFromState cstate)
     char        quotec = '\0';
     char        escapec = '\0';
 
-    if (cstate->opts.csv_mode)
+    if (is_csv)
     {
         quotec = cstate->opts.quote[0];
         escapec = cstate->opts.escape[0];
@@ -1306,7 +1306,7 @@ CopyReadLineText(CopyFromState cstate)
         prev_raw_ptr = input_buf_ptr;
         c = copy_input_buf[input_buf_ptr++];
 
-        if (cstate->opts.csv_mode)
+        if (is_csv)
         {
             /*
              * If character is '\\' or '\r', we may need to look ahead below.
@@ -1345,7 +1345,7 @@ CopyReadLineText(CopyFromState cstate)
         }
 
         /* Process \r */
-        if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+        if (c == '\r' && (!is_csv || !in_quote))
         {
             /* Check for \r\n on first line, _and_ handle \r\n. */
             if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1373,10 +1373,10 @@ CopyReadLineText(CopyFromState cstate)
                     if (cstate->eol_type == EOL_CRNL)
                         ereport(ERROR,
                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                                 !cstate->opts.csv_mode ?
+                                 !is_csv ?
                                  errmsg("literal carriage return found in data") :
                                  errmsg("unquoted carriage return found in data"),
-                                 !cstate->opts.csv_mode ?
+                                 !is_csv ?
                                  errhint("Use \"\\r\" to represent carriage return.") :
                                  errhint("Use quoted CSV field to represent carriage return.")));
 
@@ -1390,10 +1390,10 @@ CopyReadLineText(CopyFromState cstate)
             else if (cstate->eol_type == EOL_NL)
                 ereport(ERROR,
                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errmsg("literal carriage return found in data") :
                          errmsg("unquoted carriage return found in data"),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errhint("Use \"\\r\" to represent carriage return.") :
                          errhint("Use quoted CSV field to represent carriage return.")));
             /* If reach here, we have found the line terminator */
@@ -1401,15 +1401,15 @@ CopyReadLineText(CopyFromState cstate)
         }
 
         /* Process \n */
-        if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+        if (c == '\n' && (!is_csv || !in_quote))
         {
             if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
                 ereport(ERROR,
                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errmsg("literal newline found in data") :
                          errmsg("unquoted newline found in data"),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errhint("Use \"\\n\" to represent newline.") :
                          errhint("Use quoted CSV field to represent newline.")));
             cstate->eol_type = EOL_NL;    /* in case not set yet */
@@ -1421,7 +1421,7 @@ CopyReadLineText(CopyFromState cstate)
          * In CSV mode, we only recognize \. alone on a line.  This is because
          * \. is a valid CSV data value.
          */
-        if (c == '\\' && (!cstate->opts.csv_mode || first_char_in_line))
+        if (c == '\\' && (!is_csv || first_char_in_line))
         {
             char        c2;
 
@@ -1454,7 +1454,7 @@ CopyReadLineText(CopyFromState cstate)
 
                     if (c2 == '\n')
                     {
-                        if (!cstate->opts.csv_mode)
+                        if (!is_csv)
                             ereport(ERROR,
                                     (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                                      errmsg("end-of-copy marker does not match previous newline style")));
@@ -1463,7 +1463,7 @@ CopyReadLineText(CopyFromState cstate)
                     }
                     else if (c2 != '\r')
                     {
-                        if (!cstate->opts.csv_mode)
+                        if (!is_csv)
                             ereport(ERROR,
                                     (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                                      errmsg("end-of-copy marker corrupt")));
@@ -1479,7 +1479,7 @@ CopyReadLineText(CopyFromState cstate)
 
                 if (c2 != '\r' && c2 != '\n')
                 {
-                    if (!cstate->opts.csv_mode)
+                    if (!is_csv)
                         ereport(ERROR,
                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                                  errmsg("end-of-copy marker corrupt")));
@@ -1508,7 +1508,7 @@ CopyReadLineText(CopyFromState cstate)
                 result = true;    /* report EOF */
                 break;
             }
-            else if (!cstate->opts.csv_mode)
+            else if (!is_csv)
             {
                 /*
                  * If we are here, it means we found a backslash followed by
----

> In this case, I have been able to limit the effects of the per-row
> callback by making NextCopyFromRawFields() local to copyfromparse.c
> while applying some inlining to it.  This brings me to a different
> point, why don't we do this change independently on HEAD?

Does this mean that changing

bool NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)

to (adding "static")

static bool NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)

not  (adding "static" and "bool is_csv")

static bool NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool is_csv)

improves performance?

If so, adding the change independently on HEAD makes
sense. But I don't know why that improves
performance... Inlining?

>                                                            It's not 
> really complicated to make NextCopyFromRawFields show high in the
> profiles.  I was looking at external projects, and noticed that
> there's nothing calling NextCopyFromRawFields() directly.

It means that we can hide NextCopyFromRawFields() without
breaking compatibility (because nobody uses it), right?

If so, I also think that we can change
NextCopyFromRawFields() directly.

If we assume that someone (not public code) may use it, we
can create a new internal function and use it something
like:

----
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 7cacd0b752..b1515ead82 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -751,8 +751,8 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
  *
  * NOTE: force_not_null option are not applied to the returned fields.
  */
-bool
-NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
+static bool
+NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields)
 {
     int            fldct;
     bool        done;
@@ -840,6 +840,12 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     return true;
 }
 
+bool
+NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
+{
+    return NextCopyFromRawFieldsInternal(cstate, fields, nfields);
+}
+
 /*
  * Read next tuple from file for COPY FROM. Return false if no more tuples.
  *
----


Thanks,
-- 
kou
pgsql-hackers by date:
From: Andrew Dunstan
Date: 22 February, 12:38:32
Subject: Re: WIP Incremental JSON Parser
From: Ashutosh Bapat
Date: 22 February, 12:55:42
Subject: Re: Test to dump and restore objects left behind by regression
Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers

Previous

Next