Thread: Two small questions re/ COPY CSV data into table
Hello, I'm using COPY to load CSV data into the table in PG 10.x and have to simple questions: 1. I do use as command: COPY adm_cat FROM '/home/sisis/PostgreSQL/PDcsv/adm_cat.load' WITH DELIMITER '|' CSV ; which works fine. When I read the PG docs the syntax should be ... WITH DELIMITER '|', FORMAT CSV ; But this gives an syntax error as: LINE 1: .../PostgreSQL/PDcsv/adm_cat.load' WITH DELIMITER '|', FORMAT C... and the ',' is marked as its place. Why? 2. The CSV export is done by some Sybase tool which escapes the delimiter as '\|', i.e. putting a backslash before the delimiter. I found no way that COPY understands this excaping. Any ideas? Thanks matthias -- Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045 Public GnuPG key: http://www.unixarea.de/key.pub May, 9: Спаси́бо освободители! Thank you very much, Russian liberators!
>>>>> "Matthias" == Matthias Apitz <guru@unixarea.de> writes: Matthias> ... WITH DELIMITER '|', FORMAT CSV ; I think you misread the docs; the new-style syntax would be COPY ... WITH (DELIMITER '|', FORMAT CSV); where the parens are not optional. The old-style syntax with no parens after WITH is what you were using before. Matthias> 2. The CSV export is done by some Sybase tool which escapes Matthias> the delimiter as '\|', i.e. putting a backslash before the Matthias> delimiter. I found no way that COPY understands this Matthias> excaping. Any ideas? That sounds like the file is not actually a CSV - why do you think it is? PG accepts two formats (actually 3 if you count binary format which is rarely used): TEXT: records are delimited by newlines, columns are delimited by a delimiter character (default tab), there are no quotation marks, any newline, backslash, or delimiter in the data must be escaped as a backslash-sequence (e.g. \n or \012) CSV: columns may be quoted (in which case delimiters and newlines inside them are ignored), records are delimited by newlines _outside_ quoted fields, there are no backslash-sequences or escapes outside of quoted fields, quote characters inside quoted fields are doubled (though there's an option to change this). PG follows the CSV spec at https://www.ietf.org/rfc/rfc4180.txt fairly closely. -- Andrew (irc:RhodiumToad)
El día Tuesday, June 04, 2019 a las 11:32:45AM +0100, Andrew Gierth escribió: > Matthias> 2. The CSV export is done by some Sybase tool which escapes > Matthias> the delimiter as '\|', i.e. putting a backslash before the > Matthias> delimiter. I found no way that COPY understands this > Matthias> excaping. Any ideas? > > That sounds like the file is not actually a CSV - why do you think it > is? > > ... Well, it's not strictly CSV, but it is what the Sybase tool produces. The delimiter is set to '|' and a data example line looks like: 0|1| cat $1 \| lpr -Pprinter |3|4 I do load this now with COPY in mode TEXT and modify the data before with: sed 's/|/\v/g' < table-from-Sybase | sed 's/\\\v/|/g' > table-for-copy Works fine. matthias -- Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045 Public GnuPG key: http://www.unixarea.de/key.pub May, 9: Спаси́бо освободители! Thank you very much, Russian liberators!
>>>>> "Matthias" == Matthias Apitz <guru@unixarea.de> writes: >> That sounds like the file is not actually a CSV - why do you think >> it is? Matthias> Well, it's not strictly CSV, In other words it's not a CSV at all. Matthias> I do load this now with COPY in mode TEXT and modify the data Matthias> before with: Matthias> sed 's/|/\v/g' < table-from-Sybase | sed 's/\\\v/|/g' > table-for-copy What on earth is this supposed to achieve? -- Andrew (irc:RhodiumToad)
El día Tuesday, June 04, 2019 a las 04:41:47PM +0100, Andrew Gierth escribió: > Matthias> I do load this now with COPY in mode TEXT and modify the data > Matthias> before with: > > Matthias> sed 's/|/\v/g' < table-from-Sybase | sed 's/\\\v/|/g' > table-for-copy > > What on earth is this supposed to achieve? It first translates any char '|' to vtab and then any '\vtab' (i.e. a backslash followed by a vtab) back to the char | The new DELIMITER for PG is then vtab and the | is just a char in the data (in the example above a pipe of two UNIX cmd). Do you do UNIX? matthias -- Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045 Public GnuPG key: http://www.unixarea.de/key.pub May, 9: Спаси́бо освободители! Thank you very much, Russian liberators!
>>>>> "Matthias" == Matthias Apitz <guru@unixarea.de> writes: Matthias> sed 's/|/\v/g' < table-from-Sybase | sed 's/\\\v/|/g' > table-for-copy >> What on earth is this supposed to achieve? Matthias> It first translates any char '|' to vtab and then any '\vtab' Matthias> (i.e. a backslash followed by a vtab) back to the char | Matthias> The new DELIMITER for PG is then vtab and the | is just a char in the Matthias> data (in the example above a pipe of two UNIX cmd). Yes, but why? If you use COPY ... WITH DELIMITER '|' (i.e. text mode, not CSV mode) then the \| is accepted as being a literal | and the unescaped | is treated as a delimiter. What is the point of the substitutions? -- Andrew (irc:RhodiumToad)
El día Tuesday, June 04, 2019 a las 05:56:49PM +0100, Andrew Gierth escribió: > Matthias> The new DELIMITER for PG is then vtab and the | is just a char in the > Matthias> data (in the example above a pipe of two UNIX cmd). > > Yes, but why? > > If you use COPY ... WITH DELIMITER '|' (i.e. text mode, not CSV mode) > then the \| is accepted as being a literal | and the unescaped | is > treated as a delimiter. What is the point of the substitutions? In the original data record in Sybase there is a column containing, i.e. some UNIX print command to be read and executed by the application: cmd | lpr ... it is exported into the file as cmd \| lpr ... because the export delimiter is set to '|'; the COPY in text mode was complaining because it interpreted the | of the sequence \| as an delimiter and the number of columns did not matched. I will provide tomorrow the exact input line, the exact COPY command and the error. matthias -- Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045 Public GnuPG key: http://www.unixarea.de/key.pub May, 9: Спаси́бо освободители! Thank you very much, Russian liberators!
>>>>> "Matthias" == Matthias Apitz <guru@unixarea.de> writes: Matthias> it is exported into the file as Matthias> cmd \| lpr ... Matthias> because the export delimiter is set to '|'; Matthias> the COPY in text mode was complaining because it interpreted Matthias> the | of the sequence \| as an delimiter and the number of Matthias> columns did not matched. Matthias> I will provide tomorrow the exact input line, the exact COPY Matthias> command and the error. This worked for me: postgres=# copy ct(a,b,c,d,e) from stdin with delimiter '|'; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 0|1| cat $1 \| lpr -Pprinter |3|4 >> \. COPY 1 -- Andrew (irc:RhodiumToad)
Mathias: On Tue, Jun 4, 2019 at 8:35 PM Matthias Apitz <guru@unixarea.de> wrote: > El día Tuesday, June 04, 2019 a las 05:41:43PM +0200, Francisco Olarte escribió: > > On Tue, Jun 4, 2019 at 5:03 PM Matthias Apitz <guru@unixarea.de> wrote: > > > Well, it's not strictly CSV, but it is what the Sybase tool produces. The > > > delimiter is set to '|' and a data example line looks like: > > > 0|1| cat $1 \| lpr -Pprinter |3|4 > > > I do load this now with COPY in mode TEXT and modify the data before > > > with: > > > sed 's/|/\v/g' < table-from-Sybase | sed 's/\\\v/|/g' > table-for-copy > > You are lucky it is not CSV, it is much better, it's similar to the > > text format originally used by postgres. Your problem is just it uses > > | instead of TAB for field delimiter. > > What I d not know is why you use \v, doesn't sed use \t for TAB? > I was thinking about \t and was afraid, that in ~2 GByte char fields > even some \t could be used in the data. That's why I decided to use \v > which is more unlikely to be used in our data. Fine then. I haven't got much sed-foo under my belt, but if you do this in perl ( I learned perl a while ago, forgot all awk and sed to free some brain cells ;-> ) or something similar you can use high code points or control chars ( and I do not know if sed can do several replacements in a pass, but perl certainly can ). Something like ( untested ) "perl -pe 's/\\\\/\001/g; s/\\\|/\002/; s/\|/\t/; s/\002/|/; s/\001/\\\\/", the advantage over the pipe approach may be noticeable on big files, as it cuts a lot of context switches ( OTOH it'll run single-core ). ( the sample tries to do scaped backslash to soh scaped pipe to stx pipe to tab stx to unescaped pipe soh to escaped backlash ) In fact you can do "tab to backslash t" after phase 2 and tab would be ok in the input. But, as I said before and others have also pointed, COPY is perfectly happy to use pipe as delimiter and havinf it escaped with backslash on input. > (You didn't cc'ed the list, by intention?) No, I forgot to hit reply-all because I did not pay enough attention, my fault. Doing it now in case some one thinks this is interesting. Francisco Olarte.
El día Tuesday, June 04, 2019 a las 07:20:54PM +0200, Matthias Apitz escribió: > El día Tuesday, June 04, 2019 a las 05:56:49PM +0100, Andrew Gierth escribió: > > > If you use COPY ... WITH DELIMITER '|' (i.e. text mode, not CSV mode) > > then the \| is accepted as being a literal | and the unescaped | is > > treated as a delimiter. What is the point of the substitutions? > > ... > I will provide tomorrow the exact input line, the exact COPY command and > the error. It works exactly as Andrew states. My error was not caused by treating the '|' of '\|' as DELIMITER, but by not having WITH (NULL '' ...) in the COPY cmd. I misinterpreted the error message as not matching number of columns. Sorry for the noise. Thanks matthias -- Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045 Public GnuPG key: http://www.unixarea.de/key.pub May, 9: Спаси́бо освободители! Thank you very much, Russian liberators!