Hi,
From time to time, I need to deal with bizarrely delimited text files,
having to use tools such as awk/sed/perl to reformat the files
so that they can be copied into PostgreSQL.
If such files could be imported to a table with a single text column,
we could then use PostgreSQL's now really fast regex-engine
to do data cleaning/reformatting, instead of having to rely on external awk-ward tools.
Furthermore, sometimes you don't want to clean/reformat the data at all,
but simply import the text lines "as is" without modifications,
such as when wanting to import unformatted log files,
where the log lines can contain any characters.
Could it be an idea to exploit the fact that DELIMITER E'\n' is currently an error?
ERROR: COPY delimiter cannot be newline or carriage return
That is, to change E'\n' to be a valid delimiter, which would simply read each line
delimited by newlines, as a single column.
The hack I'm currently abusing is to find some one-byte character that is not present anywhere in the text file,
and then to use that character as a delimiter. This doesn't work when needing to deal with a text file
which content is unknown at the time when writing the code though, so it's mostly useful for throwaway one-off queries.
Thoughts?
/Joel
MySQL seems to already support using \n as a delimiter (I haven't verified it myself though) [1]