Thread: Memory use in 8.3 plpgsql with heavy use of xpath()
I've been using Postgres 8.3 in a production environment for some time now, and have noticed a difference in long-term non-shared memory use by Postgres since upgrading the system from 8.1. After initially making the switch to 8.3 and making only one schema change (changed a text column to xml) the server ran and used memory more or less the same as it had in 8.1. Later, I added a large set of plpgsql trigger functions that operate on that new xml column data, using the xpath() function to extract bits of XML and populate them into normal tables. The server has been running in this fashion for many months now, and there is a noticeable difference in how Postgres is using memory now, in that over time it's non-shared memory use is climbing higher and higher. Right now I'm tracking this from data captured by Munin on the system. The memory creep is very slight, but over many months is easy to discern. I was wondering if anyone else out there is making heavy use of xpath() and related functions in plpgsql functions, and if they've noticed a trend like this? Is this normal for Postgres to accumulate non-shared memory over time, up to some configured maximum? A second question I have about is about plpgsql and memory use in general. If I run some of these plpgsql functions using a lot of xpath() calls on large sets of data, huge amounts of memory are consumed (gigabytes) and the memory seems to accumulate until the transaction the functions are running in completes. Do invocations of plpgsql functions retain memory they used until the end of the transaction? -- m@
"Matt Magoffin" <postgresql.org@msqr.us> writes: > Later, I added a large set of plpgsql trigger functions that operate on > that new xml column data, using the xpath() function to extract bits of > XML and populate them into normal tables. The server has been running in > this fashion for many months now, and there is a noticeable difference in > how Postgres is using memory now, in that over time it's non-shared memory > use is climbing higher and higher. Right now I'm tracking this from data > captured by Munin on the system. The memory creep is very slight, but over > many months is easy to discern. Ugh. Sounds like "small memory leak inside libxml2" --- probably not going to be easy to find. Can you put together a self-contained test case? regards, tom lane
> Ugh. Sounds like "small memory leak inside libxml2" --- probably not > going to be easy to find. Can you put together a self-contained test > case? OK, I'll try to come up with something. Do you have a recommended way of capturing the amount memory being used by Postgres related to this? I was thinking I would have a plpgsql function that loops a large number of times, calling a few xpath() calls, and could take a memory snapshot before/after the function completes. Is there a call within Postgres I can make, or should I just rely on OS-level tools for this? -- m@
"Matt Magoffin" <postgresql.org@msqr.us> writes: >> Ugh. Sounds like "small memory leak inside libxml2" --- probably not >> going to be easy to find. Can you put together a self-contained test >> case? > OK, I'll try to come up with something. Do you have a recommended way of > capturing the amount memory being used by Postgres related to this? I was > thinking I would have a plpgsql function that loops a large number of > times, calling a few xpath() calls, Yeah, that's what I'd try first. regards, tom lane
>> OK, I'll try to come up with something. Do you have a recommended way of >> capturing the amount memory being used by Postgres related to this? I >> was >> thinking I would have a plpgsql function that loops a large number of >> times, calling a few xpath() calls, > > Yeah, that's what I'd try first. > > regards, tom lane Below is a test case that simulates the use of xpath() within a plpgsql function in my application. I'm not sure of a good way to measure the retained memory before/after the script runs, however. I ran this several times and the postgres process that ran it does have more memory allocated afterwards than before, but I don't know what is expected and what isn't. You can adjust the number of loops that run at num_loops int8 := 1000000; -- m@ ----- DROP TABLE IF EXISTS tmp_xml_test CASCADE; DROP TABLE IF EXISTS tmp_xml_addr CASCADE; CREATE TEMPORARY TABLE tmp_xml_test( id int4, x xml, primary key (id)); CREATE TEMPORARY TABLE tmp_xml_addr( id int4, xmlpos int4, street text, city text, state text, zip text, primary key (id, xmlpos)); CREATE OR REPLACE FUNCTION tmp_extract_address(data_row tmp_xml_test) RETURNS void AS $BODY$ DECLARE addr_row tmp_xml_addr%ROWTYPE; tmp_txt text; tmp_array xml[]; BEGIN addr_row.id := data_row.id; DELETE FROM tmp_xml_addr WHERE id = data_row.id; tmp_array := xpath( '/po:purchaseOrder/*[name(.) = "shipTo" or name(.) = "billTo"]', data_row.x, ARRAY[ARRAY['po', 'http://www.example.com/PO1']]); IF array_upper(tmp_array, 1) > 0 THEN FOR idx IN 1..array_upper(tmp_array, 1) LOOP addr_row.xmlpos := idx; addr_row.street := upper(XMLSERIALIZE(CONTENT(xpath( '/po:purchaseOrder/*[name(.) = "shipTo" or name(.) = "billTo"][' ||idx|| ']/po:street[1]/text()', data_row.x, ARRAY[ARRAY['po', 'http://www.example.com/PO1']]))[1] as text)); addr_row.city := upper(XMLSERIALIZE(CONTENT(xpath( '/po:purchaseOrder/*[name(.) = "shipTo" or name(.) = "billTo"][' ||idx|| ']/po:city[1]/text()', data_row.x, ARRAY[ARRAY['po', 'http://www.example.com/PO1']]))[1] as text)); addr_row.state := upper(XMLSERIALIZE(CONTENT(xpath( '/po:purchaseOrder/*[name(.) = "shipTo" or name(.) = "billTo"][' ||idx|| ']/po:state[1]/text()', data_row.x, ARRAY[ARRAY['po', 'http://www.example.com/PO1']]))[1] as text)); addr_row.zip := upper(XMLSERIALIZE(CONTENT(xpath( '/po:purchaseOrder/*[name(.) = "shipTo" or name(.) = "billTo"][' ||idx|| ']/po:zip[1]/text()', data_row.x, ARRAY[ARRAY['po', 'http://www.example.com/PO1']]))[1] as text)); INSERT INTO tmp_xml_addr (id, xmlpos, street, city, state, zip) VALUES (addr_row.id, addr_row.xmlpos, addr_row.street, addr_row.city, addr_row.state, addr_row.zip); END LOOP; END IF; RETURN; END; $BODY$ LANGUAGE 'plpgsql'; CREATE OR REPLACE FUNCTION tmp_populate_addr() RETURNS trigger AS $BODY$ BEGIN PERFORM tmp_extract_address(NEW); RETURN NEW; END; $BODY$ LANGUAGE 'plpgsql'; CREATE TRIGGER tmp_populate_addr_trigger AFTER INSERT OR UPDATE ON tmp_xml_test FOR EACH ROW EXECUTE PROCEDURE tmp_populate_addr(); DROP FUNCTION IF EXISTS tmp_test_loop(); CREATE OR REPLACE FUNCTION tmp_test_loop() RETURNS SETOF tmp_xml_addr AS $BODY$ DECLARE num_loops int8 := 1000000; BEGIN FOR idx IN 1..num_loops LOOP INSERT INTO tmp_xml_test VALUES (idx, $$<purchaseOrder xmlns="http://www.example.com/PO1" orderDate="1999-10-20"> <shipTo country="US"> <name>Alice Smith</name> <street>123 Maple Street</street> <city>Mill Valley</city> <state>CA</state> <zip>90952</zip> </shipTo> <billTo country="US"> <name>Robert Smith</name> <street>8 Oak Avenue</street> <city>Old Town</city> <state>PA</state> <zip>95819</zip> </billTo> <comment>Hurry, my lawn is going wild!</comment> <items> <item partNum="872-AA"> <productName>Lawnmower</productName> <quantity>1</quantity> <USPrice>148.95</USPrice> <comment>Confirm this is electric</comment> </item> <item partNum="926-AA"> <productName>Baby Monitor</productName> <quantity>1</quantity> <USPrice>39.98</USPrice> <shipDate>1999-05-21</shipDate> </item> </items> </purchaseOrder>$$); END LOOP; FOR idx IN 1..num_loops LOOP UPDATE tmp_xml_test SET id = idx WHERE id = idx; END LOOP; RETURN QUERY SELECT * FROM tmp_xml_addr ORDER BY id, xmlpos; END; $BODY$ LANGUAGE 'plpgsql'; SELECT * FROM tmp_test_loop();
"Matt Magoffin" <postgresql.org@msqr.us> writes: > Below is a test case that simulates the use of xpath() within a plpgsql > function in my application. I'm able to duplicate the memory leak in this function with the current Fedora 8 libxml2 (2.6.32). The leak is definitely inside libxml2 itself, because the bloat shows up here: LibxmlContext: 150986752 total in 27 blocks; 6025448 free (204 chunks); 144961304 used and that context is only used for allocations requested by libxml2. (We reset that context at every transaction end, so the only way for the problem to become really noticeable in Postgres is to do a whole lot of xpath() calls in a single transaction.) Some idle looking through the context suggests that the leakage consists of short individually palloc'd null-terminated strings that seem to be words from your test data. I tried dumping the context stats at successive entries to xpath(): LibxmlContext: 260038656 total in 40 blocks; 3934120 free (204 chunks); 256104536 used LibxmlContext: 260038656 total in 40 blocks; 3934088 free (204 chunks); 256104568 used LibxmlContext: 260038656 total in 40 blocks; 3934056 free (204 chunks); 256104600 used LibxmlContext: 260038656 total in 40 blocks; 3934056 free (205 chunks); 256104600 used LibxmlContext: 260038656 total in 40 blocks; 3934000 free (205 chunks); 256104656 used LibxmlContext: 260038656 total in 40 blocks; 3933960 free (204 chunks); 256104696 used LibxmlContext: 260038656 total in 40 blocks; 3933928 free (204 chunks); 256104728 used LibxmlContext: 260038656 total in 40 blocks; 3933896 free (204 chunks); 256104760 used LibxmlContext: 260038656 total in 40 blocks; 3933856 free (204 chunks); 256104800 used LibxmlContext: 260038656 total in 40 blocks; 3933816 free (204 chunks); 256104840 used LibxmlContext: 260038656 total in 40 blocks; 3933784 free (204 chunks); 256104872 used LibxmlContext: 260038656 total in 40 blocks; 3933752 free (204 chunks); 256104904 used LibxmlContext: 260038656 total in 40 blocks; 3933752 free (205 chunks); 256104904 used LibxmlContext: 260038656 total in 40 blocks; 3933696 free (205 chunks); 256104960 used LibxmlContext: 260038656 total in 40 blocks; 3933656 free (204 chunks); 256105000 used LibxmlContext: 260038656 total in 40 blocks; 3933624 free (204 chunks); 256105032 used which shows that not every call in this test case leaks memory, but the majority do, and that the leakage runs between 32 and 56 bytes per call. This is on a 64-bit machine with cassert enabled, which means that the palloc overhead is 24 bytes per chunk. So I think we can fairly confidently assert that libxml is leaking exactly one short string per xpath() call --- the leak is too small for it to be more than one palloc chunk at a time. I looked through the libxml2 sources a little bit but couldn't immediately find the problem. I'm fairly confident though that this could be reproduced outside Postgres, by replicating the sequence of libxml2 calls we make in xpath(). The next step should probably be to build a reproducer program and submit it to the libxml authors, or maybe run it under a debugging malloc package that could help identify where the leak is. Anyone want to do that? regards, tom lane
> I looked through the libxml2 sources a little bit but couldn't > immediately find the problem. I'm fairly confident though that > this could be reproduced outside Postgres, by replicating the sequence > of libxml2 calls we make in xpath(). The next step should probably be > to build a reproducer program and submit it to the libxml authors, > or maybe run it under a debugging malloc package that could help > identify where the leak is. Anyone want to do that? Ugh, I'd love to give that a shot but that is a bit outside my comfort zone. -- m@
On Wed, 2 Jul 2008, Tom Lane wrote: > "Matt Magoffin" <postgresql.org@msqr.us> writes: >> Below is a test case that simulates the use of xpath() within a plpgsql >> function in my application. > > I'm able to duplicate the memory leak in this function with the current > Fedora 8 libxml2 (2.6.32). The leak is definitely inside libxml2 > itself, because the bloat shows up here: > I think this should fix it. Kris Jurka
Attachment
>> I'm able to duplicate the memory leak in this function with the current >> Fedora 8 libxml2 (2.6.32). The leak is definitely inside libxml2 >> itself, because the bloat shows up here: >> > > I think this should fix it. > > Kris Jurka It looks like xml.c source has changed considerably since 8.3 (looking at revision 1.68.2.2 from the 8.3.3. release). Do you know where/if this patch would apply to the 8.3 branch? -- m@
"Matt Magoffin" <postgresql.org@msqr.us> writes: > It looks like xml.c source has changed considerably since 8.3 No, hardly at all actually, but this patch happens to be right next door to one of the lines that did change. cstring_to_text() replaces some grottier stuff that used to be used for the same purpose. In a few minutes I'm going to verify whether Kris' fix makes the leak go away for me, and commit to CVS if so. regards, tom lane
>> I think this should fix it. >> >> Kris Jurka > > It looks like xml.c source has changed considerably since 8.3 (looking at > revision 1.68.2.2 from the 8.3.3. release). Do you know where/if this > patch would apply to the 8.3 branch? I diff'ed 1.74 and 1.68.2.2, and I'm guessing this new line could be added after line 3203 in 1.68.2.2? I'll give that a try... -- m@
"Matt Magoffin" <postgresql.org@msqr.us> writes: >>> I think this should fix it. >>> Kris Jurka Confirmed, that makes it go away nicely here: LibxmlContext: 57344 total in 3 blocks; 55720 free (202 chunks); 1624 used >> It looks like xml.c source has changed considerably since 8.3 (looking at >> revision 1.68.2.2 from the 8.3.3. release). Do you know where/if this >> patch would apply to the 8.3 branch? Here's what I just committed to the 8.3 branch: Index: xml.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/adt/xml.c,v retrieving revision 1.68.2.2 diff -c -r1.68.2.2 xml.c *** xml.c 24 Mar 2008 19:12:58 -0000 1.68.2.2 --- xml.c 2 Jul 2008 23:57:20 -0000 *************** *** 3201,3206 **** --- 3201,3207 ---- result = (text *) palloc(len + VARHDRSZ); SET_VARSIZE(result, len + VARHDRSZ); memcpy(VARDATA(result), str, len); + xmlFree(str); } return result; regards, tom lane
"Matt Magoffin" <postgresql.org@msqr.us> writes: > Later, I added a large set of plpgsql trigger functions that operate on > that new xml column data, using the xpath() function to extract bits of > XML and populate them into normal tables. The server has been running in > this fashion for many months now, and there is a noticeable difference in > how Postgres is using memory now, in that over time it's non-shared memory > use is climbing higher and higher. Right now I'm tracking this from data > captured by Munin on the system. The memory creep is very slight, but over > many months is easy to discern. Looking back, I'm not sure that what we just found explains that part of your original report. > If I run some of these plpgsql functions using a lot of xpath() calls on > large sets of data, huge amounts of memory are consumed (gigabytes) and > the memory seems to accumulate until the transaction the functions are > running in completes. This part seems to match the bug though --- the leak is approximately the same size as all the text returned by xpath() within the current transaction. So there may be a second issue remaining to be found. Can you put together a test case for the long-term small leak? regards, tom lane
> This part seems to match the bug though --- the leak is approximately > the same size as all the text returned by xpath() within the current > transaction. > > So there may be a second issue remaining to be found. Can you put > together a test case for the long-term small leak? > > regards, tom lane Hmm, I'm not sure what else to add to this test case. This test case was a good example of what our database is doing with xpath(); it is using quite a number of them, that's all. Is there something else in particular you'd be looking for in another test case? -- m@
"Matt Magoffin" <postgresql.org@msqr.us> writes: >> So there may be a second issue remaining to be found. Can you put >> together a test case for the long-term small leak? > Hmm, I'm not sure what else to add to this test case. This test case was a > good example of what our database is doing with xpath(); it is using quite > a number of them, that's all. Is there something else in particular you'd > be looking for in another test case? Well, you tell me --- *you* reported a behavior that isn't obviously explained by the bug we found. It's possible that what you were seeing was an indirect effect of the now-known bug: if the xpath leak were to occur repeatedly on a large scale in a long-lived session, I think it's possible that memory allocation behavior might suffer due to fragmentation effects. I feel that that's a pretty hand-wavy explanation though. Probably the right thing for you to do now is just to install the known fix, and keep an eye on your server for awhile to see if you still see any indication of the long-term leak behavior. regards, tom lane
> Probably the right thing for you to do now is just to install the known > fix, and keep an eye on your server for awhile to see if you still see > any indication of the long-term leak behavior. Certainly, that is my plan. Once I can get the patch rolled out to these systems, I should be able to see if the memory-creep trend is resolved after a couple of weeks or so. -- m@
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Well, you tell me --- *you* reported a behavior that isn't obviously > explained by the bug we found. In case it wasn't clear, the bug found was a intra-transaction memory leak. When the transaction ended the memory would be reclaimed. That doesn't seem to match the description of long term memory growth. > It's possible that what you were seeing was an indirect effect of the > now-known bug: if the xpath leak were to occur repeatedly on a large > scale in a long-lived session, I think it's possible that memory > allocation behavior might suffer due to fragmentation effects. > I feel that that's a pretty hand-wavy explanation though. Another explanation is that there wasn't a inter-transaction memory leak, it was just that the high water mark would grow whenever a transaction processed more data than previous transactions. If the data set size varies a lot most common distributions would have the majority of data sets be about the same size with a long tail of larger sets. That might like a slow creep as the increases get rarer and rarer but do continue to happen. That's just a special case of what would be expected to happen with memory allocation anyways though. Few allocators return memory to the OS anyways. It might just be exaggerated in this case since probably a significant part of Postgres's footprint here was the per-transaction memory being used by this leak. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!
Gregory Stark <stark@enterprisedb.com> writes: > That's just a special case of what would be expected to happen with memory > allocation anyways though. Few allocators return memory to the OS anyways. Well, that does happen on Linux for instance. Since Matt knew in his original report that the xpath leak was intra-transaction, I assumed he must be using a platform where malloc/free can release memory back to the OS --- else he couldn't have seen that behavior from outside the backend. Still, it's entirely possible that some sort of high-water-mark is involved somewhere, perhaps in malloc's internal data structures. regards, tom lane
> Gregory Stark <stark@enterprisedb.com> writes: >> That's just a special case of what would be expected to happen with >> memory >> allocation anyways though. Few allocators return memory to the OS >> anyways. > > Well, that does happen on Linux for instance. Since Matt knew in his > original report that the xpath leak was intra-transaction, I assumed > he must be using a platform where malloc/free can release memory back > to the OS --- else he couldn't have seen that behavior from outside > the backend. > > Still, it's entirely possible that some sort of high-water-mark is > involved somewhere, perhaps in malloc's internal data structures. I was really going on a hunch, as I noticed a definite trend of postgres processes using more and more memory over time, and it only started after switching to 8.3 and starting to use xpath() quite heavily. Most of the memory data I have comes from Linux x64 systems with Postgres compiled as 64-bit. But I did also notice what appeared to be a similar trend on an OS X PPC system. In any event, I'm sorry I can't provide any more helpful tests, but I'll report back how the system changes after I've patched these systems. -- m@
>> Gregory Stark <stark@enterprisedb.com> writes: >>> That's just a special case of what would be expected to happen with >>> memory >>> allocation anyways though. Few allocators return memory to the OS >>> anyways. >> >> Well, that does happen on Linux for instance. Since Matt knew in his >> original report that the xpath leak was intra-transaction, I assumed >> he must be using a platform where malloc/free can release memory back >> to the OS --- else he couldn't have seen that behavior from outside >> the backend. >> >> Still, it's entirely possible that some sort of high-water-mark is >> involved somewhere, perhaps in malloc's internal data structures. > > I was really going on a hunch, as I noticed a definite trend of postgres > processes using more and more memory over time, and it only started after > switching to 8.3 and starting to use xpath() quite heavily. Most of the > memory data I have comes from Linux x64 systems with Postgres compiled as > 64-bit. But I did also notice what appeared to be a similar trend on an OS > X PPC system. > > In any event, I'm sorry I can't provide any more helpful tests, but I'll > report back how the system changes after I've patched these systems. I'm following up on this thread from a month ago on PG 8.3 memory use. I'm afraid even after updating to 8.3.3 + this patch, I still see the same overall memory trend. You can see what I'm looking at here with a couple of memory graphs. The first comes from our old primary database server, which started on 8.1 and then we switched to 8.3. I've added a callout on the chart to when we switched to 8.3. Before the switch, the application memory use (bright green) was pretty consistent, around 250MB. After the switch, you can see the application memory trends upwards over time. http://msqr.us/pg83-mem/pg83-memlead-1.png Around June, we then switched to some new hardware. It was on this machine about a month ago I updated to 8.3.3 + the xml.c patch referred to in this thread. I've pointed this out on this chart: http://msqr.us/pg83-mem/pg83-memlead-2.png Here the application memory continues to trend upwards over time since the 8.3.3 server was started, up to 1.2GB as of now. Am I right in my understanding that in general I should not see this every-increasing use of app memory by PostgreSQL? Or will PostgreSQL slowly work towards consuming all the shared memory we've allocated (4GB in this case) and retain that? The only apps running on this box are PostgreSQL and Slony for replication. -- m@
"Matt Magoffin" <postgresql.org@msqr.us> writes: > I'm following up on this thread from a month ago on PG 8.3 memory use. I'm > afraid even after updating to 8.3.3 + this patch, I still see the same > overall memory trend. You can see what I'm looking at here with a couple > of memory graphs. These numbers don't even have any demonstrable connection to Postgres, let alone to an xpath-related memory leak. You're going to need to come up with a concrete test case if you want anyone to investigate. regards, tom lane
> These numbers don't even have any demonstrable connection to Postgres, > let alone to an xpath-related memory leak. You're going to need to come > up with a concrete test case if you want anyone to investigate. > > regards, tom lane As I said in the start of this thread, this is all just a hunch, and the graphs only show you the overall picture of this machine. However Postgres is the only application running, and I can see on the box that all the memory is being consumed by various postgres processes. In addition when Postgres is restarted, all this memory is freed. Something changed in the behavior of our database between running 8.1 and 8.3, and the most significant change we made was the use of xpath() and the XML type. My general question remains: should Postgres slowly be accumulating memory like this, possibly up to the maximum amount of shared memory we've allocated for it (4GB in this case)? If so then this memory trend isn't something I should worry about. -- m@