Home > mailing lists

Re: Re: COPY table_name (single_column) FROM 'iso-8859-1.txt' DELIMITER E'\n' - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Re: COPY table_name (single_column) FROM 'iso-8859-1.txt' DELIMITER E'\n'
Date	May 5, 2021 18:45:41
Msg-id	4040826.1620240341@sss.pgh.pa.us Whole thread Raw
In response to	Re: COPY table_name (single_column) FROM 'iso-8859-1.txt' DELIMITER E'\n' ("Joel Jacobson" <joel@compiler.org>)
Responses	Re: COPY table_name (single_column) FROM 'iso-8859-1.txt' DELIMITER E'\n' Re: COPY table_name (single_column) FROM 'iso-8859-1.txt' DELIMITER E'\n'
List	pgsql-hackers

Tree view

"Joel Jacobson" <joel@compiler.org> writes:
> I think you misunderstood the problem.
> I don't want the entire file to be considered a single value.
> I want each line to become its own row, just a row with a single column.

> So I actually think COPY seems like a perfect match for the job,
> since it does precisely that, except there is no delimiter in this case.

Well, there's more to it than just the column delimiter.

* What about \N being converted to NULL?
* What about \. being treated as EOF?
* Do you want to turn off the special behavior of backslash (ESCAPE)
  altogether?
* What about newline conversions (\r\n being seen as just \n, etc)?

I'm inclined to think that "use pg_read_file and then split at newlines"
might be a saner answer than delving into all these fine points.
Not least because people yell when you add cycles to the COPY
inner loops.

> I'm currently using the pg_read_file()-hack in a project,
> and even though it can read files up to 1GB,
> using e.g. regexp_split_to_table() to split on E'\n'
> seems to need 4x as much memory, so it only
> works with files less than ~256MB.

Yeah, that's because of the conversion to "chr".  But a regexp
is overkill for that anyway.  Don't we have something that will
split on simple substring matches?

            regards, tom lane

pgsql-hackers by date:

From: Peter Geoghegan
Date: 05 May 2021, 18:40:14
Subject: Re: MaxOffsetNumber for Table AMs

From: Jeff Davis
Date: 05 May 2021, 19:09:17
Subject: Re: MaxOffsetNumber for Table AMs

Re: Re: COPY table_name (single_column) FROM 'iso-8859-1.txt' DELIMITER E'\n' - Mailing list pgsql-hackers

Previous

Next