Re: Add header support to text format and matching feature - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: Add header support to text format and matching feature
Date
Msg-id d35e05ac-fde1-0dba-39bf-b30e75082b48@dunslane.net
Whole thread Raw
In response to Re: Add header support to text format and matching feature  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: Add header support to text format and matching feature
List pgsql-hackers
On 2022-06-07 Tu 11:47, Julien Rouhaud wrote:
> Hi,
>
> On Wed, Mar 30, 2022 at 09:11:09AM +0200, Peter Eisentraut wrote:
>> Committed, after some further refinements as discussed.
> While working on nearby code, I found some problems with this feature.
>
> First, probably nitpicking, the HEADER MATCH is allowed for COPY TO, is that
> expected?  The documentation isn't really explicit about it, but there's
> nothing to match when exporting data it's a bit surprising.  I'm not opposed to
> have HEADER MATCH means HEADER ON for COPY TO, as as-is one can easily reuse
> the commands history, but maybe it should be clearly documented?


I think it makes more sense to have a sanity check to prevent HEADER
MATCH with COPY TO.


>
> Then, apparently HEADER MATCH doesn't let you do sanity checks against a custom
> column list.  This one looks like a clear oversight, as something like that
> should be entirely valid IMHO:
>
> CREATE TABLE tbl(col1 int, col2 int);
> COPY tbl (col2, col1) TO '/path/to/file' WITH (HEADER MATCH);
> COPY tbl (col2, col1) FROM '/path/to/file' WITH (HEADER MATCH);
>
> but right now it errors out with:
>
> ERROR:  column name mismatch in header line field 1: got "col1", expected "col2"
>
> Note that the error message is bogus if you specify attributes in a
> different order from the relation, as the code is mixing access to the tuple
> desc and access to the raw fields with the same offset.
>
> This also means that it will actually fail to detect a mismatch in the provided
> column list and let you import data in the wrong position as long as the
> datatypes are compatible and the column header in the file are in the correct
> order.  For instance:
>
> CREATE TABLE abc (a text, b text, c text);
> INSERT INTO abc SELECT 'a', 'b', 'c';
> COPY abc TO '/path/to/file' WITH (HEADER MATCH);
>
> You can then import the data with any of those:
> COPY abc(c, b, a) TO '/path/to/file' WITH (HEADER MATCH);
> COPY abc(c, a, b) TO '/path/to/file' WITH (HEADER MATCH);
> [...]
> SELECT * FROM abc;
>
> Even worse, if you try to do a COPY ... FROM ... WITH (HEADER ON) on a table
> that has some dropped attribute(s).  The current code will access random memory
> as there's no exact attnum / raw field mapping anymore.


Ouch! That certainly needs to be fixed.


>
> I can work on a fix if needed (with some additional regression test to cover
> those cases), but I'm still not sure that having a user provided column list is
> supposed to be accepted or not for the HEADER MATCH.  In the meantime I will
> add an open item.
>
>


I think it should, but a temporary alternative would be to forbid HEADER
MATCH with explicit column lists until we can make it work right.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




pgsql-hackers by date:

Previous
From: Aleksander Alekseev
Date:
Subject: Quick question regarding HeapTupleHeaderData.t_ctid
Next
From: Jim Popovitch
Date:
Subject: connection failures on forked processes