Thread: extreme memory use when loading in a lot of data
I have some historic data that I want to analyze. To do this I set up postgres on a spare box I picked up for cheap, which just lucked into having tons of RAM (1.5G). I set up postgres to use 10000 buffers, and recompiled the kernel to allow 2Gb data size limit per process. Since this is historical data, I'm actually merging a couple of dumps that span the time range. I've dealt with eliminating any conflicting data (ie, clashing unique keys) but I'm not 100% sure that the foreign key constraints are all met. Thus, when loading the data from the second dump, I am leaving the FK triggers on. Now, this is where my trouble has begun... On importing row 29,796,801 for the first big table, I get this (after 27 hours!): pg_restore: ERROR: out of memory DETAIL: Failed on request of size 44. CONTEXT: COPY msg_recipients, line 29796801: "75312 3434358 sent" pg_restore: [archiver (db)] error returned by PQendcopy (I had originally tried this with a 1Gb data size limit on the process, and it died at around row 15 million.) The curious thing is that watching the process size grow on another window, is that it shrunk considerably a few minutes before it croaked: % ps axuw | grep postg pgsql 26032 6.0 7.9 2186888 123440 ?? D Thu09AM 122:26.10 postmaster: khera vkmlm [local] COPY (postgres) Note here that we are using nearly 2gb virtual memory (I'm sure a bunch of that is stack and the shared memory segment). a few minutes later I saw this: % ps axuw | grep postg pgsql 26032 10.4 5.5 91840 85624 ?? S Thu09AM 123:17.24 postmaster: khera vkmlm [local] idle (postgres) note the process size is down to 91M. A few minutes later I got the out of memory error. This is very curious because I don't expect the process to release the memory back to the OS like that. There are about 157 million records in this table, three columns wide. FreeBSD 4.10-PRERELEASE, PostgreSQL 7.4.2 So, is there any way to load this in or do I need to either break it into chunks (no small feat) or disable triggers during load and hope and pray there are no FK violations? Is there some leak or does it just take that much RAM to load in data from a table? -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-301-869-4449 x806 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
On Fri, 21 May 2004, Vivek Khera wrote: > I have some historic data that I want to analyze. To do this I set up > postgres on a spare box I picked up for cheap, which just lucked into > having tons of RAM (1.5G). I set up postgres to use 10000 buffers, > and recompiled the kernel to allow 2Gb data size limit per process. > > Since this is historical data, I'm actually merging a couple of dumps > that span the time range. I've dealt with eliminating any conflicting > data (ie, clashing unique keys) but I'm not 100% sure that the foreign > key constraints are all met. Thus, when loading the data from the > second dump, I am leaving the FK triggers on. I'd suggest dropping the constraints, adding the data and adding the constraint again. If you're using 7.4 the speed will be better for checking the constraint, and if the constraint is not satisfied, you'll need to remove the offending row and recreate the constraint, but that's better than having to reimport. > Now, this is where my trouble has begun... On importing row 29,796,801 > for the first big table, I get this (after 27 hours!): I'd wonder if some large portion of the memory is the deferred trigger queue which doesn't yet spill over to disk when it gets too large.
Vivek Khera <khera@kcilink.com> writes: > Since this is historical data, I'm actually merging a couple of dumps > that span the time range. I've dealt with eliminating any conflicting > data (ie, clashing unique keys) but I'm not 100% sure that the foreign > key constraints are all met. Thus, when loading the data from the > second dump, I am leaving the FK triggers on. I think you'd be better off to drop the FK constraint, import, and re-add the constraint. The out-of-memory problem is probably due to the list of deferred trigger firings (one per tuple, or more if you have multiple FKs to check). Even if you had enough memory, you'd not have enough patience for all those retail FK checks to occur after the COPY finishes. At least in 7.4, adding an FK constraint on an existing table should produce a better plan than the retail checks involved in adding rows to a table with an existing FK constraint. regards, tom lane
>> key constraints are all met. Thus, when loading the data from the >> second dump, I am leaving the FK triggers on. > > I'd suggest dropping the constraints, adding the data and adding the > constraint again. If you're using 7.4 the speed will be better for > checking the constraint, and if the constraint is not satisfied, you'll > need to remove the offending row and recreate the constraint, but > that's > better than having to reimport. Thanks to Stephan and Tom for the same suggestion. I'm also glad to know it is not a leak but just a large amount of memory use. The load is still taking a LOOONG time because it is just a lowly IDE disk (but it is UDMA100)... 96 hours so far and the memory usage is steady.