Re: COPY table_name (single_column) FROM 'iso-8859-1.txt' DELIMITER E'\n' - Mailing list pgsql-hackers

From Joel Jacobson
Subject Re: COPY table_name (single_column) FROM 'iso-8859-1.txt' DELIMITER E'\n'
Date
Msg-id 45428207-d977-4c15-94b3-d85dd2911e10@www.fastmail.com
Whole thread Raw
In response to Re: COPY table_name (single_column) FROM 'iso-8859-1.txt' DELIMITER E'\n'  ("David G. Johnston" <david.g.johnston@gmail.com>)
Responses Re: Re: COPY table_name (single_column) FROM 'iso-8859-1.txt' DELIMITER E'\n'
List pgsql-hackers
On Wed, May 5, 2021, at 19:58, David G. Johnston wrote:
On Wed, May 5, 2021 at 10:34 AM Isaac Morland <isaac.morland@gmail.com> wrote:
On Wed, 5 May 2021 at 13:23, Chapman Flack <chap@anastigmatix.net> wrote:
On 05/05/21 13:02, David G. Johnston wrote:
> Why not just allow: "DELIMITER NONE" to be valid syntax meaning exactly
> what it says and does exactly what you desire?

What would it mean? That you get one column, multiple rows of text
corresponding to "lines" delimited by something, or that you get one
column, one row of text for the entire content of the file?

It means no column delimiter. In other words, there is no character which marks the end of a data value, so the entire line is a single data value.


This.  When dealing with COPY it's expected that each line becomes its own row.  On the server you can do pg_read_file() if you need the entire file to be considered a single value.  psql (\I and variables) is a bit more hackey, but I'd rather see that improved directly anyway if the goal is to try and make getting the "whole document" easier - copy isn't the right API for that IMO.

I think you misunderstood the problem.
I don't want the entire file to be considered a single value.
I want each line to become its own row, just a row with a single column.

So I actually think COPY seems like a perfect match for the job,
since it does precisely that, except there is no delimiter in this case.

I'm currently using the pg_read_file()-hack in a project,
and even though it can read files up to 1GB,
using e.g. regexp_split_to_table() to split on E'\n'
seems to need 4x as much memory, so it only
works with files less than ~256MB.

SELECT COUNT(*) FROM regexp_split_to_table(repeat(E'\n',1000000000),E'\n');
ERROR:  invalid memory alloc request size 4000000004
Time: 4151.374 ms (00:04.151)

/Joel

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: RFC: Detailed reorder buffer stats dumps
Next
From: Peter Geoghegan
Date:
Subject: Re: MaxOffsetNumber for Table AMs