Thread: COPY problem

COPY problem

From

"Creager, Robert S"

Date:

09 March 2001, 16:34:46

I think this is a question regarding the backend, but...

I'm in the process of changing 1 large table (column wise) into 6 smaller
tables, and ran into a situation.  I'm using Postgresql 7.1beta5, Pg as
included, Perl 5.6, Solaris 2.6 on an Ultra 5.

The new setup is 6 tables, the 'main' table loc with id SERIAL, while the
other 5 (u,b,v,r,i) are identical, id INT4 REFERENCES loc.

What I've done is copy the original table into a file, and am now attempting
to copy from stdin, using Perl/Pg to break out the data into the 6 tables.
I'm working with 2.5 million records btw.  I've narrowed the situation to
occur when copying to any one of the 5 referring tables (COPY u FROM stdin).
The backend process which handles the db connection decides that it needs a
whole lot of memory, although in a nice controlled manner.  The backend
starts with using 6.5Mb, and at 25000 records copied, it's taken 10Mb and
has slowed down substantially.  Needless to say, this COPY will not finish
before running out of memory (estimated 300Mb).  When executing the COPY to
the loc table, this problem does not occur.  Am I going to have to resort to
inserts for the referring tables?

Robert Creager
Senior Software Engineer
Client Server Library
303.673.2365 V
303.661.5379 F
888.912.4458 P
StorageTek
INFORMATION made POWERFUL

Re: COPY problem

From

Tom Lane

Date:

09 March 2001, 22:22:33

"Creager, Robert S" <CreagRS@LOUISVILLE.STORTEK.COM> writes:
> What I've done is copy the original table into a file, and am now attempting
> to copy from stdin, using Perl/Pg to break out the data into the 6 tables.
> I'm working with 2.5 million records btw.  I've narrowed the situation to
> occur when copying to any one of the 5 referring tables (COPY u FROM stdin).
> The backend process which handles the db connection decides that it needs a
> whole lot of memory, although in a nice controlled manner.  The backend
> starts with using 6.5Mb, and at 25000 records copied, it's taken 10Mb and
> has slowed down substantially.  Needless to say, this COPY will not finish
> before running out of memory (estimated 300Mb).

Ah, another memory leak that's yet un-plugged.  Can you gin up a
self-contained example that reproduces the leak?  Should be fixable
if we can figure out exactly where the leak is occurring.

            regards, tom lane

Re: COPY problem

From

"Gordon A. Runkle"

Date:

12 March 2001, 20:15:48

In article <10FE17AD5F7ED31188CE002048406DE8514CEE@lsv-msg06.stortek.com>,
"Creager, Robert S" <CreagRS@louisville.stortek.com> wrote:

> I think this is a question regarding the backend, but...

[snip]

> (COPY u FROM stdin). The backend process which handles the db connection
> decides that it needs a whole lot of memory, although in a nice
> controlled manner.  The backend starts with using 6.5Mb, and at 25000
> records copied, it's taken 10Mb and has slowed down substantially.
> Needless to say, this COPY will not finish before running out of memory
> (estimated 300Mb).  When executing the COPY to the loc table, this
> problem does not occur.  Am I going to have to resort to inserts for the
> referring tables?

I can't answer the backend question, but how about running
'split' on the big file, then COPYing these smaller files?

Gordon.

--
It doesn't get any easier, you just go faster.
   -- Greg LeMond

RE: COPY problem

From

"Creager, Robert S"

Date:

12 March 2001, 20:35:55

Tom believes there may be a memory leak, which would be causing the
(strangely enough) memory problem.  Didn't think about reducing the import
size.  What I might try in that case would be to re-connect to the db
periodically, rather than splitting the file.  The problem becomes
unmanageable after around 15000 entries, so splitting 2.5M lines wouldn't be
pleasant (167 files).

Thanks,
Rob

Robert Creager
Senior Software Engineer
Client Server Library
303.673.2365 V
303.661.5379 F
888.912.4458 P
StorageTek
INFORMATION made POWERFUL



> -----Original Message-----
> From: Gordon A. Runkle [mailto:gar@no-spam-integrated-dynamics.com]
> Sent: Friday, March 09, 2001 7:18 PM
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] COPY problem
>
> I can't answer the backend question, but how about running
> 'split' on the big file, then COPYing these smaller files?
>
> Gordon.
>

Re: COPY problem

From

Tom Lane

Date:

14 March 2001, 17:21:15

"Creager, Robert S" <CreagRS@LOUISVILLE.STORTEK.COM> writes:
> I'm working with 2.5 million records btw.  I've narrowed the situation to
> occur when copying to any one of the 5 referring tables (COPY u FROM stdin).
> The backend process which handles the db connection decides that it needs a
> whole lot of memory, although in a nice controlled manner.  The backend
> starts with using 6.5Mb, and at 25000 records copied, it's taken 10Mb and
> has slowed down substantially.  Needless to say, this COPY will not finish
> before running out of memory (estimated 300Mb).  When executing the COPY to
> the loc table, this problem does not occur.  Am I going to have to resort to
> inserts for the referring tables?

It turns out that the main bug here is not that you see a memory leak
for the referring tables, but that you fail to see one for the referred
table :-(.  We implement foreign key constraints via AFTER EVENT
triggers, and the current implementation of such triggers requires
saving information about each insert/update/delete event occurring
during the current transaction.  So that's where the memory is going.

I noticed that the trigger code was being a little bit stupid about
calculating the amount of memory it needed to allocate, so I modified
that, for a net savings of perhaps a third of the per-tuple memory.
But that's about as much as we can do about the issue for 7.1.

I think the code could be made smarter --- in particular, I doubt that
it's really necessary to save info about INSERT events when you have
only AFTER UPDATE and/or AFTER DELETE triggers.  But this seems too
delicate a change to risk making at this point in the 7.1 cycle.  We'll
just have to live with it for awhile longer.

In the meantime I'd suggest limiting the number of tuples that you
insert per transaction.

            regards, tom lane