Memory exhaustion during bulk insert - Mailing list pgsql-hackers

From Xin Wang
Subject Memory exhaustion during bulk insert
Date
Msg-id 49E57169.1030104@gmail.com
Whole thread Raw
Responses Re: Memory exhaustion during bulk insert  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Memory exhaustion during bulk insert  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-hackers
Hi all,

I'm doing an experimental project using Postgres as the prototype.
I want to store attribute values of xml type in an internal XML table
which is created for every XML column. One XML node (element,
attribute or text) is stored as a tuple in the XML table. While
a 127MB XML document 'dblp.xml' (that has about 4 million XML nodes
thus 4 million tuples) is being stored, 2GB memory is exhausted rapidly
and then my computer hangs up. I guess the reason is that the memory
runs out before the transaction is committed because the number of
tuples being inserted is too large.

The flow of tuple insertion and functions called are as follows:

while (get the next XML node != NULL)
{
/* fill in values and isnull array */
...
tup = heap_form_tuple(tupleDesc, values, isnull);
simple_heap_insert(xmlTable, tup);
...
heap_freetuple(tup);
}

I searched the mailinglist archive and noticed that a patch to improve
bulk insert performance is committed in Nov 2008. The log message said

"(the patch) keeps the current target buffer pinned and make it work
in a small ring of buffers to avoid having bulk inserts trash the whole
buffer arena."

However, I do not know much about the code below the heapam layer. Can that
patch solve my problem (the version I use is 8.3.5)? Or could you give me
some suggestion about how to avoid memory exhaustion during bulk insert
(in the meanwhile it must clean up nicely after a transaction abort)?

Thanks in advance.
Regards,


pgsql-hackers by date:

Previous
From: Itagaki Takahiro
Date:
Subject: Patch for server-side encoding issues
Next
From: Fujii Masao
Date:
Subject: Re: Warm Standby restore_command documentation (was: New trigger option of pg_standby)