Re: Add header support to text format and matching feature - Mailing list pgsql-hackers

From Daniel Verite
Subject Re: Add header support to text format and matching feature
Date
Msg-id 80a9b594-01d6-4ee4-a612-9ae9fd3c1194@manitou-mail.org
Whole thread Raw
In response to Re: Add header support to text format and matching feature  (Rémi Lapeyre <remi.lapeyre@lenstra.fr>)
Responses Re: Add header support to text format and matching feature  (vignesh C <vignesh21@gmail.com>)
List pgsql-hackers
    Rémi Lapeyre wrote:

> Here’s a new version that fix all the issues.

Here's a review of v4.

The patch improves COPY in two ways:

- COPY TO and COPY FROM now accept "HEADER ON" for the TEXT format
(previously it was only for CSV)

- COPY FROM also accepts "HEADER match" to tell that there's a header
and that its contents must match the columns of the destination table.
This works for both the CSV and TEXT formats. The syntax for the
columns is the same as for the data and the match is case-sensitive.

The first improvement when submitted alone (in 2018 by Simon Muller)
has been judged not useful enough or even hazardous without any
"match" feature. It was returned with feedback in 2018-10 and
resubmitted by Rémi in 2020-02 with the match feature.

The patches  apply cleanly, "make check" and "make check-world" pass.

In my tests it works fine except for one crash that I can reproduce
on a fresh build and default configuration with:

$ cat >file.txt
i
1

$ psql
postgres=# create table x(i int);
CREATE TABLE
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
PANIC:    ERRORDATA_STACK_SIZE exceeded
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

I suspect the reason is the way that PG_TRY/PG_CATCH is used, see below.

Code comments:


+/*
+ * Represents whether the header must be absent, present or present and
match.
+ */
+typedef enum CopyHeader
+{
+    COPY_HEADER_ABSENT,
+    COPY_HEADER_PRESENT,
+    COPY_HEADER_MATCH
+} CopyHeader;
+
 /*
  * This struct contains all the state variables used throughout a COPY
  * operation. For simplicity, we use the same struct for all variants of
COPY,
@@ -136,7 +146,7 @@ typedef struct CopyStateData
    bool        binary;         /* binary format? */
    bool        freeze;         /* freeze rows on loading? */
    bool        csv_mode;        /* Comma Separated Value
format? */
-    bool        header_line;    /* CSV or text header line? */
+    CopyHeader  header_line;    /* CSV or text header line? */


After the redefinition into this enum type, there are still a
bunch of references to header_line that treat it like a boolean:

1190:            if (cstate->header_line)
1398:    if (cstate->binary && cstate->header_line)
2119:        if (cstate->header_line)
3635:    if (cstate->cur_lineno == 0 && cstate->header_line)

It works fine since COPY_HEADER_ABSENT is 0 as the first value of the enum,
but maybe it's not good style to count on that.



+            PG_TRY();
+            {
+                if (defGetBoolean(defel))
+                    cstate->header_line =
COPY_HEADER_PRESENT;
+                else
+                    cstate->header_line =
COPY_HEADER_ABSENT;
+            }
+            PG_CATCH();
+            {
+                char    *sval = defGetString(defel);
+
+                if (!cstate->is_copy_from)
+                    PG_RE_THROW();
+
+                if (pg_strcasecmp(sval, "match") == 0)
+                    cstate->header_line =
COPY_HEADER_MATCH;
+                else
+                    ereport(ERROR,
+
(errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("header requires a
boolean or \"match\"")));
+            }
+            PG_END_TRY();

It seems wrong to use a PG_CATCH block for this. I understand that
it's because defGetBoolean() calls ereport() on non-booleans, but then
it should be split into an error-throwing function and a
non-error-throwing lexical analysis of the boolean, the above code
calling the latter.
Besides the comments in elog.h above PG_TRY say that
 "the error recovery code
  can either do PG_RE_THROW to propagate the error outwards, or do a
  (sub)transaction abort. Failure to do so may leave the system in an
  inconsistent state for further processing."
Maybe this is what happens with the repeated uses of "match"
eventually failing with ERRORDATA_STACK_SIZE exceeded.


-    HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+    HEADER { <literal>match</literal> | <literal>true</literal> |
<literal>false</literal> }

This should be enclosed in square brackets because HEADER
with no argument is still accepted.




+      names from the table. On input, the first line is discarded when set
+      to <literal>true</literal> or required to match the column names if
set

The elision of "header" as the subject might be misinterpreted as if
it's the first line that is true.  I'd suggest
"when <literal>header>/literal> is set to ..."    to avoid any confusion.



Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: proposal: possibility to read dumped table's name from file
Next
From: Tom Lane
Date:
Subject: Re: INSERT INTO SELECT, Why Parallelism is not selected?