Re: Add header support to text format and matching feature - Mailing list pgsql-hackers

From Rémi Lapeyre
Subject Re: Add header support to text format and matching feature
Date
Msg-id 0B55BD07-83E4-439F-AACC-FA2D7CF50532@lenstra.fr
Whole thread Raw
In response to Re: Add header support to text format and matching feature  (vignesh C <vignesh21@gmail.com>)
Responses Re: Add header support to text format and matching feature  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
Thanks Daniel for the review and Vignesh for addressing the comments.

I have two remarks with the state of the current patches:
- DefGetCopyHeader() duplicates a lot of code from defGetBoolean(), should we refactor this so that they can share more
oftheir internals? In the current implementation any change to defGetBoolean() should be made to DefGetCopyHeader() too
ortheir behaviour will subtly differ. 
- It is possible to set the header option multiple time:
     \copy x(i) from file.txt with (format csv, header off, header on);
  In which case the last one is the one kept. I think this is a bug and it should be fixed, but this is already the
behaviourin the current implementation so fixing it would not be backward compatible. Do you think users should not do
thisand I can fix it or that keeping the current behaviour is better for backward compatibility? 

Regards,
Rémi

> Le 17 août 2020 à 14:49, vignesh C <vignesh21@gmail.com> a écrit :
>
> Thanks for your comments, Please find my thoughts inline.
>
>> In my tests it works fine except for one crash that I can reproduce
>> on a fresh build and default configuration with:
>>
>> $ cat >file.txt
>> i
>> 1
>>
>> $ psql
>> postgres=# create table x(i int);
>> CREATE TABLE
>> postgres=# \copy x(i) from file.txt with (header match)
>> COPY 1
>> postgres=# \copy x(i) from file.txt with (header match)
>> COPY 1
>> postgres=# \copy x(i) from file.txt with (header match)
>> COPY 1
>> postgres=# \copy x(i) from file.txt with (header match)
>> COPY 1
>> postgres=# \copy x(i) from file.txt with (header match)
>> COPY 1
>> postgres=# \copy x(i) from file.txt with (header match)
>> PANIC:  ERRORDATA_STACK_SIZE exceeded
>> server closed the connection unexpectedly
>>        This probably means the server terminated abnormally
>>        before or while processing the request.
>> The connection to the server was lost. Attempting reset: Failed.
>>
>
> Fixed, replaced PG_TRY/PG_CATCH with strcmp logic to get the header option.
>
>>
>> Code comments:
>>
>>
>> +/*
>> + * Represents whether the header must be absent, present or present and
>> match.
>> + */
>> +typedef enum CopyHeader
>> +{
>> +       COPY_HEADER_ABSENT,
>> +       COPY_HEADER_PRESENT,
>> +       COPY_HEADER_MATCH
>> +} CopyHeader;
>> +
>> /*
>>  * This struct contains all the state variables used throughout a COPY
>>  * operation. For simplicity, we use the same struct for all variants of
>> COPY,
>> @@ -136,7 +146,7 @@ typedef struct CopyStateData
>>        bool            binary;                 /* binary format? */
>>        bool            freeze;                 /* freeze rows on loading? */
>>        bool            csv_mode;               /* Comma Separated Value
>> format? */
>> -       bool            header_line;    /* CSV or text header line? */
>> +       CopyHeader  header_line;        /* CSV or text header line? */
>>
>>
>> After the redefinition into this enum type, there are still a
>> bunch of references to header_line that treat it like a boolean:
>>
>> 1190:                   if (cstate->header_line)
>> 1398:   if (cstate->binary && cstate->header_line)
>> 2119:           if (cstate->header_line)
>> 3635:   if (cstate->cur_lineno == 0 && cstate->header_line)
>>
>> It works fine since COPY_HEADER_ABSENT is 0 as the first value of the enum,
>> but maybe it's not good style to count on that.
>
> Fixed. Changed it to cstate->header_line != COPY_HEADER_ABSENT.
>
>>
>>
>>
>> +                       PG_TRY();
>> +                       {
>> +                               if (defGetBoolean(defel))
>> +                                       cstate->header_line =
>> COPY_HEADER_PRESENT;
>> +                               else
>> +                                       cstate->header_line =
>> COPY_HEADER_ABSENT;
>> +                       }
>> +                       PG_CATCH();
>> +                       {
>> +                               char    *sval = defGetString(defel);
>> +
>> +                               if (!cstate->is_copy_from)
>> +                                       PG_RE_THROW();
>> +
>> +                               if (pg_strcasecmp(sval, "match") == 0)
>> +                                       cstate->header_line =
>> COPY_HEADER_MATCH;
>> +                               else
>> +                                       ereport(ERROR,
>> +
>> (errcode(ERRCODE_SYNTAX_ERROR),
>> +                                                errmsg("header requires a
>> boolean or \"match\"")));
>> +                       }
>> +                       PG_END_TRY();
>>
>> It seems wrong to use a PG_CATCH block for this. I understand that
>> it's because defGetBoolean() calls ereport() on non-booleans, but then
>> it should be split into an error-throwing function and a
>> non-error-throwing lexical analysis of the boolean, the above code
>> calling the latter.
>> Besides the comments in elog.h above PG_TRY say that
>> "the error recovery code
>>  can either do PG_RE_THROW to propagate the error outwards, or do a
>>  (sub)transaction abort. Failure to do so may leave the system in an
>>  inconsistent state for further processing."
>> Maybe this is what happens with the repeated uses of "match"
>> eventually failing with ERRORDATA_STACK_SIZE exceeded.
>>
>
> Fixed, replaced PG_TRY/PG_CATCH with strcmp logic to get the header option.
>
>>
>> -    HEADER [ <replaceable class="parameter">boolean</replaceable> ]
>> +    HEADER { <literal>match</literal> | <literal>true</literal> |
>> <literal>false</literal> }
>>
>> This should be enclosed in square brackets because HEADER
>> with no argument is still accepted.
>>
>
> Fixed.
>
>>
>>
>>
>> +      names from the table. On input, the first line is discarded when set
>> +      to <literal>true</literal> or required to match the column names if
>> set
>>
>> The elision of "header" as the subject might be misinterpreted as if
>> it's the first line that is true.  I'd suggest
>> "when <literal>header>/literal> is set to ..."  to avoid any confusion.
>>
>
> Fixed.
>
> Attached v5 patch with the fixes of above comments.
> Thoughts?
>
> Regards,
> Vignesh
> EnterpriseDB: http://www.enterprisedb.com
> <v5-0001-Add-header-support-to-COPY-TO-text-format.patch><v5-0002-Add-header-matching-mode-to-COPY-FROM.patch>




pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: factorial function/phase out postfix operators?
Next
From: Robert Haas
Date:
Subject: Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY