Thread: COPY problem
I think this is a question regarding the backend, but... I'm in the process of changing 1 large table (column wise) into 6 smaller tables, and ran into a situation. I'm using Postgresql 7.1beta5, Pg as included, Perl 5.6, Solaris 2.6 on an Ultra 5. The new setup is 6 tables, the 'main' table loc with id SERIAL, while the other 5 (u,b,v,r,i) are identical, id INT4 REFERENCES loc. What I've done is copy the original table into a file, and am now attempting to copy from stdin, using Perl/Pg to break out the data into the 6 tables. I'm working with 2.5 million records btw. I've narrowed the situation to occur when copying to any one of the 5 referring tables (COPY u FROM stdin). The backend process which handles the db connection decides that it needs a whole lot of memory, although in a nice controlled manner. The backend starts with using 6.5Mb, and at 25000 records copied, it's taken 10Mb and has slowed down substantially. Needless to say, this COPY will not finish before running out of memory (estimated 300Mb). When executing the COPY to the loc table, this problem does not occur. Am I going to have to resort to inserts for the referring tables? Robert Creager Senior Software Engineer Client Server Library 303.673.2365 V 303.661.5379 F 888.912.4458 P StorageTek INFORMATION made POWERFUL
"Creager, Robert S" <CreagRS@LOUISVILLE.STORTEK.COM> writes: > What I've done is copy the original table into a file, and am now attempting > to copy from stdin, using Perl/Pg to break out the data into the 6 tables. > I'm working with 2.5 million records btw. I've narrowed the situation to > occur when copying to any one of the 5 referring tables (COPY u FROM stdin). > The backend process which handles the db connection decides that it needs a > whole lot of memory, although in a nice controlled manner. The backend > starts with using 6.5Mb, and at 25000 records copied, it's taken 10Mb and > has slowed down substantially. Needless to say, this COPY will not finish > before running out of memory (estimated 300Mb). Ah, another memory leak that's yet un-plugged. Can you gin up a self-contained example that reproduces the leak? Should be fixable if we can figure out exactly where the leak is occurring. regards, tom lane
In article <10FE17AD5F7ED31188CE002048406DE8514CEE@lsv-msg06.stortek.com>, "Creager, Robert S" <CreagRS@louisville.stortek.com> wrote: > I think this is a question regarding the backend, but... [snip] > (COPY u FROM stdin). The backend process which handles the db connection > decides that it needs a whole lot of memory, although in a nice > controlled manner. The backend starts with using 6.5Mb, and at 25000 > records copied, it's taken 10Mb and has slowed down substantially. > Needless to say, this COPY will not finish before running out of memory > (estimated 300Mb). When executing the COPY to the loc table, this > problem does not occur. Am I going to have to resort to inserts for the > referring tables? I can't answer the backend question, but how about running 'split' on the big file, then COPYing these smaller files? Gordon. -- It doesn't get any easier, you just go faster. -- Greg LeMond
Tom believes there may be a memory leak, which would be causing the (strangely enough) memory problem. Didn't think about reducing the import size. What I might try in that case would be to re-connect to the db periodically, rather than splitting the file. The problem becomes unmanageable after around 15000 entries, so splitting 2.5M lines wouldn't be pleasant (167 files). Thanks, Rob Robert Creager Senior Software Engineer Client Server Library 303.673.2365 V 303.661.5379 F 888.912.4458 P StorageTek INFORMATION made POWERFUL > -----Original Message----- > From: Gordon A. Runkle [mailto:gar@no-spam-integrated-dynamics.com] > Sent: Friday, March 09, 2001 7:18 PM > To: pgsql-general@postgresql.org > Subject: Re: [GENERAL] COPY problem > > I can't answer the backend question, but how about running > 'split' on the big file, then COPYing these smaller files? > > Gordon. >
"Creager, Robert S" <CreagRS@LOUISVILLE.STORTEK.COM> writes: > I'm working with 2.5 million records btw. I've narrowed the situation to > occur when copying to any one of the 5 referring tables (COPY u FROM stdin). > The backend process which handles the db connection decides that it needs a > whole lot of memory, although in a nice controlled manner. The backend > starts with using 6.5Mb, and at 25000 records copied, it's taken 10Mb and > has slowed down substantially. Needless to say, this COPY will not finish > before running out of memory (estimated 300Mb). When executing the COPY to > the loc table, this problem does not occur. Am I going to have to resort to > inserts for the referring tables? It turns out that the main bug here is not that you see a memory leak for the referring tables, but that you fail to see one for the referred table :-(. We implement foreign key constraints via AFTER EVENT triggers, and the current implementation of such triggers requires saving information about each insert/update/delete event occurring during the current transaction. So that's where the memory is going. I noticed that the trigger code was being a little bit stupid about calculating the amount of memory it needed to allocate, so I modified that, for a net savings of perhaps a third of the per-tuple memory. But that's about as much as we can do about the issue for 7.1. I think the code could be made smarter --- in particular, I doubt that it's really necessary to save info about INSERT events when you have only AFTER UPDATE and/or AFTER DELETE triggers. But this seems too delicate a change to risk making at this point in the 7.1 cycle. We'll just have to live with it for awhile longer. In the meantime I'd suggest limiting the number of tuples that you insert per transaction. regards, tom lane