Thread: Out of memory
Hi, I've been having repeated troubles trying to get a PostgreSQL app to play nicely on Ubuntu. I recently posted a message onthis list about an out of memory error and got a resolution by reducing the work_mem setting. However I'm now getting furtherout of memory issues during the same stage of plpgsql function as mentioned before. The function itself is run as part of larger transaction which does the following: 1/ Maintains 104 tables (15 PostGIS tables), by loading and applying incremental table changes. A typical incremental loadwith maintain about 10,000 rows. 2/ When each one of these tables is updated an after trigger is fired that maintains an associated table revision table. 3/ After all of the tables are maintained a plpgsql function is called to build/maintain a set of de-normalised tables. Thesetables total about 20GB. Each one of these tables is compared against the previous table revision to determine its rowchanges. It's in this function that the out of memory exception is occurring. The server log error message I'm getting in the function is here http://pastebin.com/346zi2sS. It's very long and containsthe top transaction memory debug info. My initial observation about this error is that maybe PostgreSQL is encountering a memory corruption error because the amountof OS memory does not seem to run out. The plpgsql function uses functions from both PostGIS and pgc_checksum (http://pgfoundry.org/projects/pg-comparator)- so maybe they are the cause of the problem. Or maybe I have configured somethingwrong... I did some memory logging during and the execution of the function. It shows for the majority of the transaction executionthat the actual memory used is about 1GB (grows from the initial 600mb) with about 6.5GB cached for the OS: total used free shared buffers cached Mem: 8004 7839 165 0 0 6802 -/+ buffers/cache: 1037 6967 Swap: 397 0 397 But just before the out of memory error occurs there is a spike to 2.5GB of used memory, but there us still 4.5GB cachedby the OS: total used free shared buffers cached Mem: 8004 7702 301 0 0 4854 -/+ buffers/cache: 2848 5156 Swap: 397 0 397 Then after the error the memory slowly returns this state: total used free shared buffers cached Mem: 8004 1478 6526 0 0 1133 -/+ buffers/cache: 345 7659 Swap: 397 0 397 The OS I'm running is: Linux TSTLHAPP01 2.6.32-29-server #58-Ubuntu SMP Fri Feb 11 21:06:51 UTC 2011 x86_64 GNU/Linux. It’s a running on VMWare and, has 2 CPU’s and 8GB of RAM. This VM is dedicated to PostgreSQL, not much else is running otherthan cacti, ssh and ftp server daemons. The main OS parameters I have tuned are: vm.swappiness=0 vm.overcommit_memory=2 kernel.shmmax = 4196769792 kernel.shmall = 1024602 And the PostgreSQL is: PostgreSQL 9.0.3 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit. The main changed postgresql.conf parameters I've tuned are: shared_buffers = 512MB maintenance_work_mem = 512MB temp_buffers = 256MB work_mem = 1MB wal_buffers = 16MB effective_cache_size = 4094MB The size of the database is 350GB. The typical number of users connected to the database is 1 or 2. This database is usedfor loading external data, managing revision table information and generating and outputting de-normalised datasets,so it does not have a high number of transactions running. Typically 1 large one per day. Two questions: 1) Have I set the OS and postgresql parameter to sensible values given the hardware and database utilization. 2) Can anyone help me make sense of the top transaction memory error to help track down the issue? Any other suggestions would be greatly appreciated. Thanks Jeremy ________________________________________ From: Jeremy Palmer Sent: Saturday, 26 March 2011 9:57 p.m. To: Scott Marlowe Cc: pgsql-general@postgresql.org Subject: RE: [GENERAL] Out of memory Hi Scott, It was the work_mem that was set too high. I reduced it to 32mb and the function executed. Just so I understand this. Every time a sort is performed within a function, the sort memory is allocated, and then it notreleased until the function completes? Rather then deallocating the memory after each sort operation has completed. Thanks, Jeremy ________________________________________ From: Scott Marlowe [scott.marlowe@gmail.com] Sent: Friday, 25 March 2011 5:04 p.m. To: Jeremy Palmer Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Out of memory On Thu, Mar 24, 2011 at 9:23 PM, Jeremy Palmer <JPalmer@linz.govt.nz> wrote: > I’ve been getting database out of memory failures with some queries which > deal with a reasonable amount of data. > > I was wondering what I should be looking at to stop this from happening. > > The typical messages I been getting are like this: > http://pastebin.com/Jxfu3nYm > The OS is: > > Linux TSTLHAPP01 2.6.32-29-server #58-Ubuntu SMP Fri Feb 11 21:06:51 UTC > 2011 x86_64 GNU/Linux. > > It’s a running on VMWare and, has 2 CPU’s and 8GB of RAM. This VM is > dedicated to PostgreSQL. The main OS parameters I have tuned are: > > work_mem = 200MB That's a really big work_mem. I have mainline db servers with 128G of ram that have work_mem set to 16M and that is still considered a little high in my book. If you drop work_mem down to 1MB does the out of memory go away? work_mem is how much memory EACH sort can use on its own, if you have a plpgsql procedure that keeps running query after query, it could use a LOT of memory really fast. ______________________________________________________________________________________________________ This message contains information, which is confidential and may be subject to legal privilege. If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message. If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz) and destroythe original message. LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ. Thank you. ______________________________________________________________________________________________________
On Tue, 2011-04-05 at 21:50 +1200, Jeremy Palmer wrote: > Hi, > > I've been having repeated troubles trying to get a PostgreSQL app to play nicely on Ubuntu. I recently posted a messageon this list about an out of memory error and got a resolution by reducing the work_mem setting. However I'm now gettingfurther out of memory issues during the same stage of plpgsql function as mentioned before. > > The function itself is run as part of larger transaction which does the following: Where is the source to the function? Regards, Jeff Davis
On 04/05/11 2:50 AM, Jeremy Palmer wrote: > I've been having repeated troubles trying to get a PostgreSQL app to play nicely on Ubuntu. I recently posted a messageon this list about an out of memory error and got a resolution by reducing the work_mem setting. However I'm now gettingfurther out of memory issues during the same stage of plpgsql function as mentioned before. > > The function itself is run as part of larger transaction which does the following: > > 1/ Maintains 104 tables (15 PostGIS tables), by loading and applying incremental table changes. A typical incremental loadwith maintain about 10,000 rows. > > 2/ When each one of these tables is updated an after trigger is fired that maintains an associated table revision table. > > 3/ After all of the tables are maintained a plpgsql function is called to build/maintain a set of de-normalised tables.These tables total about 20GB. Each one of these tables is compared against the previous table revision to determineits row changes. It's in this function that the out of memory exception is occurring. a few random questions... Does that all really have to be a single transaction? Do you really need to use triggers for your revision tracking, and can't rely on your daily update cycle to manually set the revision information? Is it really necessary to generate massive denormalized tables, rather than using view's to join the data? > shared_buffers = 512MB > maintenance_work_mem = 512MB > temp_buffers = 256MB > work_mem = 1MB > wal_buffers = 16MB > effective_cache_size = 4094MB > > The size of the database is 350GB. The typical number of users connected to the database is 1 or 2. This database is usedfor loading external data, managing revision table information and generating and outputting de-normalised datasets,so it does not have a high number of transactions running. Typically 1 large one per day. with only 1-2 connections, you certainly could increase the work_mem. Alternately, this single giant transaction could manually set a larger work_mem which would only apply to it. Personally, given your 8gb system and what you've described, I think I'd set the tuning parameters something like... shared_buffers = 1GB maintenance_work_mem = 128MB temp_buffers = 64MB work_mem = 16MB wal_buffers = 16MB effective_cache_size = 4094MB adjust effective_cache_size to somewhat less than the 'cached' value shown in `free -m` after your system has been running for awhile.
Hi John, > Does that all really have to be a single transaction? Yes - I need to ensure that of the changesets and denormalised tables are created in the same transaction, so that if anerror occurs the database is rolled back to the last successfully applied changeset. I don't want to get into the businessof the splitting it into separate transactions and then having to revert changes that were applied in a previoustransaction step. > Do you really need to use triggers for your revision tracking, and can't > rely on your daily update cycle to manually set the revision information? They are not necessary, but it has the cleanest code implementation and makes the revision maintenance to the tables almosttransparent. If they are causing the problem I could change the logic... > Is it really necessary to generate massive denormalized tables, rather > than using view's to join the data? Yes - to create the tables is complex, and often involves complex functions and multiple temp tables. The overall time tocreate these tables is somewhere in the area of 3hours on this server. I'm also unloading these tables multiple times forseparate purposes, so they would need to be materialised anyway. > with only 1-2 connections, you certainly could increase the work_mem. I can't increase this value at the moment on this server because I was getting out of memory errors with the initial populationof the database (which builds the denormalized tables, but does not determine the changeset to the previous tablerevision). I tried values, 256mb - 2mb and could only get the query to run with 1mb. I suspect even this was pushing the boundary, sowhen I got to the next stage in my testing - to apply incremental updates - the memory issue raised it head again. Regards, Jeremy ______________________________________________________________________________________________________ This message contains information, which is confidential and may be subject to legal privilege. If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message. If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz) and destroythe original message. LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ. Thank you. ______________________________________________________________________________________________________
Hi Jeff, < Where is the source to the function? The source is located here: https://github.com/linz/linz_bde_uploader The main function LDS_MaintainSimplifiedLayers that is being called is on line 37 is in https://github.com/linz/linz_bde_uploader/blob/master/sql/lds_layer_functions.sql. The actual out of memory exception was caught with the bde_GetTableDifferences function source file on line 3263 in https://github.com/linz/linz_bde_uploader/blob/master/sql/bde_control_functions.sql. When I was actually getting an out of memory issue when creating the tables (not maintaining them), the query that seemedto kill the transaction was the one located at line 1463 of https://github.com/linz/linz_bde_uploader/blob/master/sql/lds_layer_functions.sql.After I dropped the work_mem to 1MB itgot past that and completed ok. But during the maintenance of the table the row differences need to be calculated and thenapplied to the table. See the LDS_ApplyTableDifferences function on line 353. Regards, Jeremy ______________________________________________________________________________________________________ This message contains information, which is confidential and may be subject to legal privilege. If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message. If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz) and destroythe original message. LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ. Thank you. ______________________________________________________________________________________________________
Well after a few days of further investigation I still can't track the issue down. The main problem I can only reproducethe error running the whole transaction. So I can't isolate the problem down to a simple use case or even smallersubset of the transaction, which would have been nice for posting to this list. Does anyone have an idea of how I might go about trying to tackle this problem now. Should I try further reducing the memorysettings? Or install a debug version of PostgreSQL and get some further information about what is going on before theerror. Any advice would be greatly appreciated. Regards, Jeremy ________________________________________ From: Jeremy Palmer Sent: Tuesday, 5 April 2011 9:50 p.m. To: pgsql-general@postgresql.org Subject: Out of memory Hi, I've been having repeated troubles trying to get a PostgreSQL app to play nicely on Ubuntu. I recently posted a message onthis list about an out of memory error and got a resolution by reducing the work_mem setting. However I'm now getting furtherout of memory issues during the same stage of plpgsql function as mentioned before. The function itself is run as part of larger transaction which does the following: 1/ Maintains 104 tables (15 PostGIS tables), by loading and applying incremental table changes. A typical incremental loadwith maintain about 10,000 rows. 2/ When each one of these tables is updated an after trigger is fired that maintains an associated table revision table. 3/ After all of the tables are maintained a plpgsql function is called to build/maintain a set of de-normalised tables. Thesetables total about 20GB. Each one of these tables is compared against the previous table revision to determine its rowchanges. It's in this function that the out of memory exception is occurring. The server log error message I'm getting in the function is here http://pastebin.com/346zi2sS. It's very long and containsthe top transaction memory debug info. My initial observation about this error is that maybe PostgreSQL is encountering a memory corruption error because the amountof OS memory does not seem to run out. The plpgsql function uses functions from both PostGIS and pgc_checksum (http://pgfoundry.org/projects/pg-comparator)- so maybe they are the cause of the problem. Or maybe I have configured somethingwrong... I did some memory logging during and the execution of the function. It shows for the majority of the transaction executionthat the actual memory used is about 1GB (grows from the initial 600mb) with about 6.5GB cached for the OS: total used free shared buffers cached Mem: 8004 7839 165 0 0 6802 -/+ buffers/cache: 1037 6967 Swap: 397 0 397 But just before the out of memory error occurs there is a spike to 2.5GB of used memory, but there us still 4.5GB cachedby the OS: total used free shared buffers cached Mem: 8004 7702 301 0 0 4854 -/+ buffers/cache: 2848 5156 Swap: 397 0 397 Then after the error the memory slowly returns this state: total used free shared buffers cached Mem: 8004 1478 6526 0 0 1133 -/+ buffers/cache: 345 7659 Swap: 397 0 397 The OS I'm running is: Linux TSTLHAPP01 2.6.32-29-server #58-Ubuntu SMP Fri Feb 11 21:06:51 UTC 2011 x86_64 GNU/Linux. It’s a running on VMWare and, has 2 CPU’s and 8GB of RAM. This VM is dedicated to PostgreSQL, not much else is running otherthan cacti, ssh and ftp server daemons. The main OS parameters I have tuned are: vm.swappiness=0 vm.overcommit_memory=2 kernel.shmmax = 4196769792 kernel.shmall = 1024602 And the PostgreSQL is: PostgreSQL 9.0.3 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit. The main changed postgresql.conf parameters I've tuned are: shared_buffers = 512MB maintenance_work_mem = 512MB temp_buffers = 256MB work_mem = 1MB wal_buffers = 16MB effective_cache_size = 4094MB The size of the database is 350GB. The typical number of users connected to the database is 1 or 2. This database is usedfor loading external data, managing revision table information and generating and outputting de-normalised datasets,so it does not have a high number of transactions running. Typically 1 large one per day. Two questions: 1) Have I set the OS and postgresql parameter to sensible values given the hardware and database utilization. 2) Can anyone help me make sense of the top transaction memory error to help track down the issue? Any other suggestions would be greatly appreciated. Thanks Jeremy ______________________________________________________________________________________________________ This message contains information, which is confidential and may be subject to legal privilege. If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message. If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz) and destroythe original message. LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ. Thank you. ______________________________________________________________________________________________________
Hi All, I running PostgreSQL 9.0.3 and getting an out of memory error while running a big transaction. This error does not crashthe backend. The nature of this transaction is it is sequentially applying data updates to a large number (104) of tables, then afterapplying those updates, a series of materialised views are created or updated using a set of pl/pgsql function. I needthis operation to be one transaction so I can rollback the update if any issue occurs. The size of the database and the database is 350GB and I would characterise this system as a data warehouse. At this stage I can't isolate the problem down to a simple use case or even smaller subset of the transaction, which wouldhave been nice for posting to this list. I can only replicate the error when I run the "entire" transaction - runningthe table update part or materialised views parts separately work fine. To make matter worse I do not see any contextaround my error "out of memory" message - which makes the message useless. The actual error occurs during a table compare operation within a function which uses 2 cursor to scan for differences. Howeverdirectly before this I have a query that generates a temp table that I had trouble with earlier (when initially tunningthe server) and had to set the work_mem to 1MB so it would run: CREATE TEMP TABLE tmp_titles AS SELECT TTL.audit_id AS id, TTL.title_no, TTL.status, TTLT.char_value AS type, LOC.name AS land_district, TTL.issue_date, TTLG.char_value AS guarantee_status, string_agg( DISTINCT( ETTT.char_value || ', ' || ETT.share || COALESCE(', ' || LGD.legal_desc_text, '') || COALESCE(', ' || to_char(ROUND(LGD.total_area, 0), 'FM9G999G999G999G999') || ' m2', '') ), E'\r\n' ORDER BY ETTT.char_value || ', ' || ETT.share || COALESCE(', ' || LGD.legal_desc_text, '') || COALESCE(', ' || to_char(ROUND(LGD.total_area, 0), 'FM9G999G999G999G999') || ' m2', '') ASC ) AS estate_description, string_agg( DISTINCT CASE PRP.type WHEN 'CORP' THEN PRP.corporate_name WHEN 'PERS' THEN COALESCE(PRP.prime_other_names || ' ', '') || PRP.prime_surname END, ', ' ORDER BY CASE PRP.type WHEN 'CORP' THEN PRP.corporate_name WHEN 'PERS' THEN COALESCE(PRP.prime_other_names || ' ', '') || PRP.prime_surname END ASC ) AS owners, count( DISTINCT CASE PRP.type WHEN 'CORP' THEN PRP.corporate_name WHEN 'PERS' THEN COALESCE(PRP.prime_other_names || ' ', '') || PRP.prime_surname END ) AS number_owners, TPA.title_no IS NOT NULL AS part_share, -- With Postgis 1.5.2 the ST_Collect aggregate returns a truncated -- collection when a null value is found. To fix this the shapes -- are order so all null shapes row are at the end of input list. ST_Multi(ST_Collect(PAR.shape ORDER BY PAR.shape ASC)) AS shape FROM crs_title TTL LEFT JOIN crs_title_estate ETT ON TTL.title_no = ETT.ttl_title_no AND ETT.status = 'REGD' LEFT JOIN crs_estate_share ETS ON ETT.id = ETS.ett_id AND ETT.status = 'REGD' LEFT JOIN crs_proprietor PRP ON ETS.id = PRP.ets_id AND PRP.status = 'REGD' LEFT JOIN crs_legal_desc LGD ON ETT.lgd_id = LGD.id AND LGD.type = 'ETT' AND LGD.status = 'REGD' LEFT JOIN crs_legal_desc_prl LGP ON LGD.id = LGP.lgd_id LEFT JOIN ( SELECT title_no FROM tmp_parcel_titles GROUP BY title_no HAVING count(*) > 1 ) TPA ON TTL.title_no = TPA.title_no LEFT JOIN ( SELECT id, (ST_Dump(shape)).geom AS shape FROM crs_parcel WHERE status = 'CURR' AND ST_GeometryType(shape) IN ('ST_MultiPolygon', 'ST_Polygon') ) PAR ON LGP.par_id = PAR.id JOIN crs_locality LOC ON TTL.ldt_loc_id = LOC.id JOIN crs_sys_code TTLG ON TTL.guarantee_status = TTLG.code AND TTLG.scg_code = 'TTLG' JOIN crs_sys_code TTLT ON TTL.type = TTLT.code AND TTLT.scg_code = 'TTLT' LEFT JOIN crs_sys_code ETTT ON ETT.type = ETTT.code AND ETTT.scg_code = 'ETTT' WHERE TTL.status IN ('LIVE', 'PRTC') AND TTL.title_no NOT IN (SELECT title_no FROM tmp_excluded_titles) GROUP BY TTL.audit_id, TTL.title_no, TTL.status, TTLT.char_value, LOC.name, TTL.issue_date, TTLG.char_value, TPA.title_no; This query does a lot of string concatenation and uses the new 9.0 string_agg function. The result set is about 2millionrows. So down to questions: 1) Is there anything I can do to get more information out about this memory error message? Debug build maybe, attach a debuggerand set some break points? 2) Is there anything I can do to optimise the memory so this transaction might be able to run? 3) I noticed in the Top Memory Context dump in the server log contained a lot of the MCs for table indexes/PKs (about 850)See http://pastebin.com/346zi2sS. Is this a problem? Could these MCs be cleaned-up part way thought the transaction?Maybe use a savepoint? Or do they exist for the life of session or transaction? 4) Can anyone help me make sense of the top transaction memory error to help track down the issue? What number should I belooking for? Well I'm pretty much at the end of the line of in terms of getting PostgreSQL to work for this task. So any guidance wouldbe soooo much appreciated. Thanks again, Jeremy PS. My config: The OS I'm running is: Linux TSTLHAPP01 2.6.32-29-server #58-Ubuntu SMP Fri Feb 11 21:06:51 UTC 2011 x86_64 GNU/Linux. It's a running on VMWare and, has 2 CPU's and 8GB of RAM. This VM is dedicated to PostgreSQL, not much else is running otherthan cacti, ssh and ftp server daemons. The main OS parameters I have tuned are: vm.swappiness=0 vm.overcommit_memory=2 kernel.shmmax = 4196769792 kernel.shmall = 1024602 And the PostgreSQL is: PostgreSQL 9.0.3 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit. The main changed postgresql.conf parameters I've tuned are: shared_buffers = 512MB maintenance_work_mem = 512MB temp_buffers = 256MB work_mem = 1MB wal_buffers = 16MB effective_cache_size = 4094MB The typical number of users connected to the database is 1 or 2. ______________________________________________________________________________________________________ This message contains information, which is confidential and may be subject to legal privilege. If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message. If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz) and destroythe original message. LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ. Thank you. ______________________________________________________________________________________________________
Jeremy Palmer <JPalmer@linz.govt.nz> writes: > I running PostgreSQL 9.0.3 and getting an out of memory error while running a big transaction. This error does not crashthe backend. If it's a standard "out of memory" message, there should be a memory context map dumped to postmaster's stderr. (Which is inconvenient for some logging arrangements, but it's hard to improve that without risking not being able to print the map for lack of memory ...) If you could show us the map it might help to figure out what's happening. regards, tom lane
Jeremy Palmer <JPalmer@linz.govt.nz> writes: > Ok I have attached the map, or least what I think the map is. Yup, that's what I was after. It looks like the main problem is here: > PortalHeapMemory: 16384 total in 4 blocks; 5944 free (0 chunks); 10440 used > ExecutorState: 122880 total in 4 blocks; 63984 free (8 chunks); 58896 used > ExprContext: 2622363000 total in 9 blocks; 21080 free (15 chunks); 2622341920 used You've evidently got a leak during execution of a query that's being run in a "portal", which most likely is a cursor or plpgsql FOR-IN-SELECT query. Unfortunately there's not enough information here to tell which query that is, but maybe you can narrow it down now. I'm guessing that some specific function or operator you're using in that query is leaking memory with successive executions. regards, tom lane
Hi Tom, Wow thank you so much for the hint! The plpgsql code that is could be to blame is in the below snippet. I had a look and I'm not sure why it might be leaking.Is it because I assign the v_id1 and v_id2 to the return table 'id' record, return it and then assign to v_id1 orv_id2 again from the cursor? CREATE OR REPLACE FUNCTION bde_control.bde_gettabledifferences(p_table1 regclass, p_table2 regclass, p_compare_key name) RETURNS TABLE("action" character, id bigint) AS ... ... FETCH FIRST FROM v_table_cur1 INTO v_id1, v_check1, v_uniq1; FETCH FIRST FROM v_table_cur2 INTO v_id2, v_check2, v_uniq2; WHILE v_id1 IS NOT NULL AND v_id2 IS NOT NULL LOOP IF v_id1 < v_id2 THEN action := 'D'; id := v_id1; RETURN NEXT; FETCH NEXT FROM v_table_cur1 INTO v_id1, v_check1, v_uniq1; CONTINUE; ELSIF v_id2 < v_id1 THEN action := 'I'; id := v_id2; RETURN NEXT; FETCH NEXT FROM v_table_cur2 INTO v_id2, v_check2, v_uniq2; CONTINUE; ELSIF v_uniq1 <> v_uniq2 THEN action := 'X'; id := v_id1; RETURN NEXT; ELSIF v_check1 <> v_check2 THEN action := 'U'; id := v_id1; RETURN NEXT; END IF; FETCH NEXT FROM v_table_cur1 INTO v_id1, v_check1, v_uniq1; FETCH NEXT FROM v_table_cur2 INTO v_id2, v_check2, v_uniq2; END LOOP; WHILE v_id1 IS NOT NULL LOOP action := 'D'; id := v_id1; RETURN NEXT; FETCH NEXT FROM v_table_cur1 INTO v_id1, v_check1, v_uniq1; END LOOP; WHILE v_id2 IS NOT NULL LOOP action := 'I'; id := v_id2; RETURN NEXT; FETCH NEXT FROM v_table_cur2 INTO v_id2, v_check2, v_uniq2; END LOOP; CLOSE v_table_cur1; CLOSE v_table_cur2; RETURN; The full function can be read in full here: https://github.com/linz/linz_bde_uploader/blob/master/sql/bde_control_functions.sql#L3263 The actual query for both cursors in the case of the table that was proabily causing the error looks like this (v_table_cur1and v_table_cur2 only differ by table referenced): SELECT id AS ID, COALESCE('V|' || CAST(T.estate_description AS TEXT), '|N') || '|V' || CAST(T.guarantee_status AS TEXT) || '|V' || CAST(T.issue_date AS TEXT) || '|V' || CAST(T.land_district AS TEXT) || '|V' || CAST(T.number_owners AS TEXT) || '|V' || CAST(T.part_share AS TEXT) || COALESCE('V|' || CAST(T.shape AS TEXT), '|N') || '|V' || CAST(T.status AS TEXT) || '|V' || CAST(T.title_no AS TEXT) || '|V' || CAST(T.type AS TEXT) AS check_sum, '' AS check_uniq FROM lds.titles AS T ORDER BY id ASC; The definition for the table looks like this: CREATE TABLE titles ( id INTEGER NOT NULL PRIMARY KEY, title_no VARCHAR(20) NOT NULL, status VARCHAR(4) NOT NULL, type TEXT NOT NULL, land_district VARCHAR(100) NOT NULL, issue_date TIMESTAMP NOT NULL, guarantee_status TEXT NOT NULL, estate_description TEXT, number_owners INT8 NOT NULL, part_share BOOLEAN NOT NULL, shape GEOMETRY, ); CREATE INDEX shx_title_shape ON titles USING gist (shape); Thanks, Jeremy ________________________________________ From: Tom Lane [tgl@sss.pgh.pa.us] Sent: Wednesday, 13 April 2011 5:44 p.m. To: Jeremy Palmer Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] 9.0 Out of memory Jeremy Palmer <JPalmer@linz.govt.nz> writes: > Ok I have attached the map, or least what I think the map is. Yup, that's what I was after. It looks like the main problem is here: > PortalHeapMemory: 16384 total in 4 blocks; 5944 free (0 chunks); 10440 used > ExecutorState: 122880 total in 4 blocks; 63984 free (8 chunks); 58896 used > ExprContext: 2622363000 total in 9 blocks; 21080 free (15 chunks); 2622341920 used You've evidently got a leak during execution of a query that's being run in a "portal", which most likely is a cursor or plpgsql FOR-IN-SELECT query. Unfortunately there's not enough information here to tell which query that is, but maybe you can narrow it down now. I'm guessing that some specific function or operator you're using in that query is leaking memory with successive executions. regards, tom lane ______________________________________________________________________________________________________ This message contains information, which is confidential and may be subject to legal privilege. If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message. If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz) and destroythe original message. LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ. Thank you. ______________________________________________________________________________________________________
Jeremy Palmer <JPalmer@linz.govt.nz> writes: > The plpgsql code that is could be to blame is in the below snippet. I had a look and I'm not sure why it might be leaking.Is it because I assign the v_id1 and v_id2 to the return table 'id' record, return it and then assign to v_id1 orv_id2 again from the cursor? No, given the info from the memory map I'd have to say that the leakage is in the cursor not in what you do in the plpgsql function. The cursor query looks fairly unexciting except for the cast from geometry to text. I don't have PostGIS installed here so I can't do any testing, but I wonder whether the leak goes away if you remove that part of the query (ie, leave the shape out of the "checksum" for testing purposes). If so, you probably ought to file the issue as a PostGIS bug. regards, tom lane
> No, given the info from the memory map I'd have to say that the leakage > is in the cursor not in what you do in the plpgsql function. The cursor > query looks fairly unexciting except for the cast from geometry to text. > I don't have PostGIS installed here so I can't do any testing, but I > wonder whether the leak goes away if you remove that part of the query > (ie, leave the shape out of the "checksum" for testing purposes). > If so, you probably ought to file the issue as a PostGIS bug. Ok I removed the geometry column from the cursor query within the function and the session still runs out of memory. I'mstill seeing the same error message as well: PortalHeapMemory: 16384 total in 4 blocks; 5944 free (0 chunks); 10440 used ExecutorState: 122880 total in 4 blocks; 63984 free (8 chunks); 58896 used ExprContext: 2496819768 total in 9 blocks; 21080 free (15 chunks); 2496798688 used So I guess it's not likely to be the PostGIS geometry to text cast that is leaking the memory. One thing that has got me interested now is query that executes directly before (see SQL below). If I remove the geometrycolumn that is generated using ST_Collect aggregate function, the subsequent function involving the cursor querycompletes and the transaction also runs to completion. Is there any way that ST_Collect could be leaking memory into a context that does not get cleaned up after the query runs?Or do I have two leaks going on here?! Cheers, Jeremy CREATE TEMP TABLE tmp_titles AS SELECT TTL.audit_id AS id, TTL.title_no, TTL.status, TTLT.char_value AS type, LOC.name AS land_district, TTL.issue_date, TTLG.char_value AS guarantee_status, string_agg( DISTINCT( ETTT.char_value || ', ' || ETT.share || COALESCE(', ' || LGD.legal_desc_text, '') || COALESCE(', ' || to_char(ROUND(LGD.total_area, 0), 'FM9G999G999G999G999') || ' m2', '') ), E'\r\n' ORDER BY ETTT.char_value || ', ' || ETT.share || COALESCE(', ' || LGD.legal_desc_text, '') || COALESCE(', ' || to_char(ROUND(LGD.total_area, 0), 'FM9G999G999G999G999') || ' m2', '') ASC ) AS estate_description, string_agg( DISTINCT CASE PRP.type WHEN 'CORP' THEN PRP.corporate_name WHEN 'PERS' THEN COALESCE(PRP.prime_other_names || ' ', '') || PRP.prime_surname END, ', ' ORDER BY CASE PRP.type WHEN 'CORP' THEN PRP.corporate_name WHEN 'PERS' THEN COALESCE(PRP.prime_other_names || ' ', '') || PRP.prime_surname END ASC ) AS owners, count( DISTINCT CASE PRP.type WHEN 'CORP' THEN PRP.corporate_name WHEN 'PERS' THEN COALESCE(PRP.prime_other_names || ' ', '') || PRP.prime_surname END ) AS number_owners, TPA.title_no IS NOT NULL AS part_share, -- With Postgis 1.5.2 the ST_Collect aggregate returns a truncated -- collection when a null value is found. To fix this the shapes -- are order so all null shapes row are at the end of input list. ST_Multi(ST_Collect(PAR.shape ORDER BY PAR.shape ASC)) AS shape FROM crs_title TTL LEFT JOIN crs_title_estate ETT ON TTL.title_no = ETT.ttl_title_no AND ETT.status = 'REGD' LEFT JOIN crs_estate_share ETS ON ETT.id = ETS.ett_id AND ETT.status = 'REGD' LEFT JOIN crs_proprietor PRP ON ETS.id = PRP.ets_id AND PRP.status = 'REGD' LEFT JOIN crs_legal_desc LGD ON ETT.lgd_id = LGD.id AND LGD.type = 'ETT' AND LGD.status = 'REGD' LEFT JOIN crs_legal_desc_prl LGP ON LGD.id = LGP.lgd_id LEFT JOIN ( SELECT title_no FROM tmp_parcel_titles GROUP BY title_no HAVING count(*) > 1 ) TPA ON TTL.title_no = TPA.title_no LEFT JOIN ( SELECT id, (ST_Dump(shape)).geom AS shape FROM crs_parcel WHERE status = 'CURR' AND ST_GeometryType(shape) IN ('ST_MultiPolygon', 'ST_Polygon') ) PAR ON LGP.par_id = PAR.id JOIN crs_locality LOC ON TTL.ldt_loc_id = LOC.id JOIN crs_sys_code TTLG ON TTL.guarantee_status = TTLG.code AND TTLG.scg_code = 'TTLG' JOIN crs_sys_code TTLT ON TTL.type = TTLT.code AND TTLT.scg_code = 'TTLT' LEFT JOIN crs_sys_code ETTT ON ETT.type = ETTT.code AND ETTT.scg_code = 'ETTT' WHERE TTL.status IN ('LIVE', 'PRTC') AND TTL.title_no NOT IN (SELECT title_no FROM tmp_excluded_titles) GROUP BY TTL.audit_id, TTL.title_no, TTL.status, TTLT.char_value, LOC.name, TTL.issue_date, TTLG.char_value, TPA.title_no; ______________________________________________________________________________________________________ This message contains information, which is confidential and may be subject to legal privilege. If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message. If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz) and destroythe original message. LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ. Thank you. ______________________________________________________________________________________________________
Jeremy Palmer <JPalmer@linz.govt.nz> writes: > Ok I removed the geometry column from the cursor query within the function and the session still runs out of memory. I'mstill seeing the same error message as well: > PortalHeapMemory: 16384 total in 4 blocks; 5944 free (0 chunks); 10440 used > ExecutorState: 122880 total in 4 blocks; 63984 free (8 chunks); 58896 used > ExprContext: 2496819768 total in 9 blocks; 21080 free (15 chunks); 2496798688 used > So I guess it's not likely to be the PostGIS geometry to text cast that is leaking the memory. OK, so that was a wrong guess. > One thing that has got me interested now is query that executes directly before (see SQL below). If I remove the geometrycolumn that is generated using ST_Collect aggregate function, the subsequent function involving the cursor querycompletes and the transaction also runs to completion. Hrm. We were pretty much guessing as to which query was running in that portal, I think. It seems entirely plausible that this other query is the one at fault instead. It might be premature to blame ST_Collect per se though --- in particular I'm wondering about the ORDER BY on the ST_Collect's input. But if this line of thought is correct, you ought to be able to exhibit a memory leak using just that sub-part of that query, without the surrounding function or any other baggage. Maybe the leak wouldn't drive the backend to complete failure without that additional overhead; but a leak of a couple gig ought to be pretty obvious when watching the process with "top" or similar tool. regards, tom lane
>> Ok I removed the geometry column from the cursor query within the function and the session still runs out of memory. I'mstill seeing the same error message as well: >> PortalHeapMemory: 16384 total in 4 blocks; 5944 free (0 chunks); 10440 used >> ExecutorState: 122880 total in 4 blocks; 63984 free (8 chunks); 58896 used >> ExprContext: 2496819768 total in 9 blocks; 21080 free (15 >> chunks); 2496798688 used >> So I guess it's not likely to be the PostGIS geometry to text cast that is leaking the memory. >OK, so that was a wrong guess. Hi Tom, I finally tracked down the issue! The query that was generating the temp table which was used as input into the 2 cursorqueries was generating an invalid, very large geometry (like 200mb). It turned out I had a bug in the previous testing I was doing, and I didn't actually remove the geometry column from thesecond cursor. So I guess a 200mb geometry being cast to text used too much memory. Not sure if there is still a leak...But I guess that depends on weather the geometry expands to over 1 GB when converted to text. Anyway I would like to personally thank you for you time in helping me with this issue. Regards, Jeremy ______________________________________________________________________________________________________ This message contains information, which is confidential and may be subject to legal privilege. If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message. If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz) and destroythe original message. LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ. Thank you. ______________________________________________________________________________________________________