Re: Add header support to text format and matching feature - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Add header support to text format and matching feature
Date
Msg-id 20220607154744.vvmitnqhyxrne5ms@jrouhaud
Whole thread Raw
In response to Re: Add header support to text format and matching feature  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
Responses Re: Add header support to text format and matching feature  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
Hi,

On Wed, Mar 30, 2022 at 09:11:09AM +0200, Peter Eisentraut wrote:
>
> Committed, after some further refinements as discussed.

While working on nearby code, I found some problems with this feature.

First, probably nitpicking, the HEADER MATCH is allowed for COPY TO, is that
expected?  The documentation isn't really explicit about it, but there's
nothing to match when exporting data it's a bit surprising.  I'm not opposed to
have HEADER MATCH means HEADER ON for COPY TO, as as-is one can easily reuse
the commands history, but maybe it should be clearly documented?

Then, apparently HEADER MATCH doesn't let you do sanity checks against a custom
column list.  This one looks like a clear oversight, as something like that
should be entirely valid IMHO:

CREATE TABLE tbl(col1 int, col2 int);
COPY tbl (col2, col1) TO '/path/to/file' WITH (HEADER MATCH);
COPY tbl (col2, col1) FROM '/path/to/file' WITH (HEADER MATCH);

but right now it errors out with:

ERROR:  column name mismatch in header line field 1: got "col1", expected "col2"

Note that the error message is bogus if you specify attributes in a
different order from the relation, as the code is mixing access to the tuple
desc and access to the raw fields with the same offset.

This also means that it will actually fail to detect a mismatch in the provided
column list and let you import data in the wrong position as long as the
datatypes are compatible and the column header in the file are in the correct
order.  For instance:

CREATE TABLE abc (a text, b text, c text);
INSERT INTO abc SELECT 'a', 'b', 'c';
COPY abc TO '/path/to/file' WITH (HEADER MATCH);

You can then import the data with any of those:
COPY abc(c, b, a) TO '/path/to/file' WITH (HEADER MATCH);
COPY abc(c, a, b) TO '/path/to/file' WITH (HEADER MATCH);
[...]
SELECT * FROM abc;

Even worse, if you try to do a COPY ... FROM ... WITH (HEADER ON) on a table
that has some dropped attribute(s).  The current code will access random memory
as there's no exact attnum / raw field mapping anymore.

I can work on a fix if needed (with some additional regression test to cover
those cases), but I'm still not sure that having a user provided column list is
supposed to be accepted or not for the HEADER MATCH.  In the meantime I will
add an open item.



pgsql-hackers by date:

Previous
From: Rod Taylor
Date:
Subject: Re: Collation version tracking for macOS
Next
From: Robert Haas
Date:
Subject: Re: pgcon unconference / impact of block size on performance