Home > mailing lists

Re: COPY table_name (single_column) FROM 'iso-8859-1.txt' DELIMITER E'\n' - Mailing list pgsql-hackers

From	Joel Jacobson
Subject	Re: COPY table_name (single_column) FROM 'iso-8859-1.txt' DELIMITER E'\n'
Date	May 5, 2021 18:35:03
Msg-id	45428207-d977-4c15-94b3-d85dd2911e10@www.fastmail.com Whole thread Raw
In response to	Re: COPY table_name (single_column) FROM 'iso-8859-1.txt' DELIMITER E'\n' ("David G. Johnston" <david.g.johnston@gmail.com>)
Responses	Re: Re: COPY table_name (single_column) FROM 'iso-8859-1.txt' DELIMITER E'\n'
List	pgsql-hackers

Tree view

On Wed, May 5, 2021, at 19:58, David G. Johnston wrote:

On Wed, May 5, 2021 at 10:34 AM Isaac Morland <isaac.morland@gmail.com> wrote:
On Wed, 5 May 2021 at 13:23, Chapman Flack <chap@anastigmatix.net> wrote:
On 05/05/21 13:02, David G. Johnston wrote:
> Why not just allow: "DELIMITER NONE" to be valid syntax meaning exactly
> what it says and does exactly what you desire?

What would it mean? That you get one column, multiple rows of text
corresponding to "lines" delimited by something, or that you get one
column, one row of text for the entire content of the file?

It means no column delimiter. In other words, there is no character which marks the end of a data value, so the entire line is a single data value.

This. When dealing with COPY it's expected that each line becomes its own row. On the server you can do pg_read_file() if you need the entire file to be considered a single value. psql (\I and variables) is a bit more hackey, but I'd rather see that improved directly anyway if the goal is to try and make getting the "whole document" easier - copy isn't the right API for that IMO.

I think you misunderstood the problem.

I don't want the entire file to be considered a single value.

I want each line to become its own row, just a row with a single column.

So I actually think COPY seems like a perfect match for the job,

since it does precisely that, except there is no delimiter in this case.

I'm currently using the pg_read_file()-hack in a project,

and even though it can read files up to 1GB,

using e.g. regexp_split_to_table() to split on E'\n'

seems to need 4x as much memory, so it only

works with files less than ~256MB.

SELECT COUNT(*) FROM regexp_split_to_table(repeat(E'\n',1000000000),E'\n');

ERROR: invalid memory alloc request size 4000000004

Time: 4151.374 ms (00:04.151)

/Joel

pgsql-hackers by date:

From: Andres Freund
Date: 05 May 2021, 18:28:43
Subject: Re: RFC: Detailed reorder buffer stats dumps

From: Peter Geoghegan
Date: 05 May 2021, 18:35:45
Subject: Re: MaxOffsetNumber for Table AMs

Re: COPY table_name (single_column) FROM 'iso-8859-1.txt' DELIMITER E'\n' - Mailing list pgsql-hackers

Previous

Next