Re: Add header support to text format and matching feature - Mailing list pgsql-hackers
From | Andrew Dunstan |
---|---|
Subject | Re: Add header support to text format and matching feature |
Date | |
Msg-id | d35e05ac-fde1-0dba-39bf-b30e75082b48@dunslane.net Whole thread Raw |
In response to | Re: Add header support to text format and matching feature (Julien Rouhaud <rjuju123@gmail.com>) |
Responses |
Re: Add header support to text format and matching feature
|
List | pgsql-hackers |
On 2022-06-07 Tu 11:47, Julien Rouhaud wrote: > Hi, > > On Wed, Mar 30, 2022 at 09:11:09AM +0200, Peter Eisentraut wrote: >> Committed, after some further refinements as discussed. > While working on nearby code, I found some problems with this feature. > > First, probably nitpicking, the HEADER MATCH is allowed for COPY TO, is that > expected? The documentation isn't really explicit about it, but there's > nothing to match when exporting data it's a bit surprising. I'm not opposed to > have HEADER MATCH means HEADER ON for COPY TO, as as-is one can easily reuse > the commands history, but maybe it should be clearly documented? I think it makes more sense to have a sanity check to prevent HEADER MATCH with COPY TO. > > Then, apparently HEADER MATCH doesn't let you do sanity checks against a custom > column list. This one looks like a clear oversight, as something like that > should be entirely valid IMHO: > > CREATE TABLE tbl(col1 int, col2 int); > COPY tbl (col2, col1) TO '/path/to/file' WITH (HEADER MATCH); > COPY tbl (col2, col1) FROM '/path/to/file' WITH (HEADER MATCH); > > but right now it errors out with: > > ERROR: column name mismatch in header line field 1: got "col1", expected "col2" > > Note that the error message is bogus if you specify attributes in a > different order from the relation, as the code is mixing access to the tuple > desc and access to the raw fields with the same offset. > > This also means that it will actually fail to detect a mismatch in the provided > column list and let you import data in the wrong position as long as the > datatypes are compatible and the column header in the file are in the correct > order. For instance: > > CREATE TABLE abc (a text, b text, c text); > INSERT INTO abc SELECT 'a', 'b', 'c'; > COPY abc TO '/path/to/file' WITH (HEADER MATCH); > > You can then import the data with any of those: > COPY abc(c, b, a) TO '/path/to/file' WITH (HEADER MATCH); > COPY abc(c, a, b) TO '/path/to/file' WITH (HEADER MATCH); > [...] > SELECT * FROM abc; > > Even worse, if you try to do a COPY ... FROM ... WITH (HEADER ON) on a table > that has some dropped attribute(s). The current code will access random memory > as there's no exact attnum / raw field mapping anymore. Ouch! That certainly needs to be fixed. > > I can work on a fix if needed (with some additional regression test to cover > those cases), but I'm still not sure that having a user provided column list is > supposed to be accepted or not for the HEADER MATCH. In the meantime I will > add an open item. > > I think it should, but a temporary alternative would be to forbid HEADER MATCH with explicit column lists until we can make it work right. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
pgsql-hackers by date: