Thread: [PERFORMANCE] Error loading 37G CSV file "invalid string enlargement request size 65536"

Hi all,

I´m running a copy for a 37G CSV and receiving the following error:

"invalid string enlargement request size 65536"

The file has about 70 million lines with 101 columns, all them varchar.

When I run the command with the whole file i receive the error after loading
about 29million lines. So i´ve spllited the file in 10 million lines with split:

split --lines=10000000

And running the copy i receive the error on the 5th file:

psql:/srv/www/htdocs/import/script_q2.sql:122: ERROR:  invalid string enlargement request size 65536
CONTEXT:  COPY temp_q2, line 3509639: ""000000009367276";"4";"DANIEL DO CARMO BARROS";"31-Jan-1986";"M";"1";"10";"3162906";"GILSON TEIXEIRA..."

Any clues?

My postgresql version is 8.2.4 the server is running suse linux with 1.5GB
Sensitive changes in postgresql.conf are:

shared_buffers = 512MB
temp_buffers = 256MB
checkpoint_segments = 60

I´d also like to know if there´s any way to optimize huge data load in operations like these.

Regards

Adonias Malosso
"Adonias Malosso" <malosso@gmail.com> writes:
> I�m running a copy for a 37G CSV and receiving the following error:
> "invalid string enlargement request size 65536"

AFAICS this would only happen if you've got an individual line of COPY
data exceeding 1GB.  (PG versions later than 8.2 give a slightly more
helpful "out of memory" error in such a case.)

Most likely, that was not your intention, and the real problem is
incorrect quoting/escaping in the CSV file, causing COPY to think
that a large number of physical lines should be read as one logical line.

            regards, tom lane

Adonias Malosso wrote:
> Hi all,
>

> split --lines=10000000
>
> And running the copy i receive the error on the 5th file:
>
> psql:/srv/www/htdocs/import/script_q2.sql:122: ERROR:  invalid string
> enlargement request size 65536
> CONTEXT:  COPY temp_q2, line 3509639: ""000000009367276";"4";"DANIEL DO
> CARMO BARROS";"31-Jan-1986";"M";"1";"10";"3162906";"GILSON TEIXEIRA..."
>
> Any clues?

quote problems from earlier than that?
one missing?
\ at end of field negating the closing quote

I'd keep splitting to help isolate - what control do you have over the
generation of the data?

Is this one off import or ongoing?

> My postgresql version is 8.2.4 the server is running suse linux with 1.5GB
> Sensitive changes in postgresql.conf are:
>
> shared_buffers = 512MB
> temp_buffers = 256MB
> checkpoint_segments = 60
>
> I´d also like to know if there´s any way to optimize huge data load in
> operations like these.

Sounds like you are already using copy. Where from? Is the data file on
the server or a seperate client? (as in reading from the same disk that
you are writing the data to?)

See if http://pgfoundry.org/projects/pgbulkload/ can help

It depends a lot on what you are doing and what table you are importing
into. Indexes will most likely be the biggest slow down, it is faster to
create them after the table is filled. Also fk restraints can slow down
as well.

Is this a live server that will still be working as you load data?

If the db is not in use try dropping all indexes (on the relevant table
anyway), loading then create indexes.

You can copy into a temp table without indexes then select into the
target table.

What fk restraints does this table have? Can they be safely deferred
during the import?


--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz