Thread: exit code -1073741819
Hi, All, I'm working on a GIS project using PostgreSQL and PostGIS. In the project I need to find locations of about 12K addresses(the process is referred to as geocoding). I wrote some script to perform this task by calling a procedure "tiger_geocoding"that is provided by PostGIS. My script seems to crash the server after a while with the following message: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost The log shows the following message: CurTransactionContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExecutorState: 122880 total in 4 blocks; 1912 free (9 chunks); 120968 used ExprContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used ExprContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used ExprContext: 8192 total in 1 blocks; 8000 free (1 chunks); 192 used ExprContext: 8192 total in 1 blocks; 8000 free (1 chunks); 192 used ExprContext: 8192 total in 1 blocks; 8096 free (1 chunks); 96 used SPI Exec: 0 total in 0 blocks; 0 free (0 chunks); 0 used SPI Proc: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used ExecutorState: 57344 total in 3 blocks; 35776 free (7 chunks); 21568 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used 2007-07-10 12:25:57 LOG: server process (PID 2004) exited with exit code -1073741819 2007-07-10 12:25:57 LOG: terminating any other active server processes 2007-07-10 12:25:59 WARNING: terminating connection because of crash of another server process My script is listed below: CREATE OR REPLACE FUNCTION geocode_fall2006 () RETURNS VOID AS $$ DECLARE address VARCHAR(255); geom GEOMETRY; tmp_geom GEOMETRY; counter INTEGER := 0; app_id VARCHAR(50); st VARCHAR(50); f_processed BOOLEAN; f_geocoded BOOLEAN; BEGIN FOR app_id, address, st, f_processed, f_geocoded IN SELECT APPLICATION_ID, add, state, geoprocessed, geocoded FROM fall2006LOOP RAISE NOTICE 'add=%, app_id=%, state=%', address, app_id, st; IF upper(st)='OH' AND f_processed='f' THEN geom := geocode_test(address); IF geom IS NOT NULL THEN counter := counter + 1; -- TIGER SRID is 32767. tmp_geom := PointFromText('POINT(' || X(geom) || ' ' || Y(geom) || ')',4269); tmp_geom := transform(tmp_geom,4326); -- id := SRID(tmp_geom); -- RAISE NOTICE 'srid=%', id; UPDATE fall2006 SET lat_lon = tmp_geom WHERE APPLICATION_ID = app_id; UPDATE fall2006 SET geocoded = 't' WHERE APPLICATION_ID = app_id; RAISE NOTICE 'UPDATE fall2006 SET lat_lon = % WHERE APPLICATION_ID = %;', AsText(tmp_geom), app_id; END IF; UPDATE fall2006 SET geoprocessed = 't' WHERE APPLICATION_ID = app_id; END IF; END LOOP; RAISE NOTICE 'counter=%', counter; END; $$ LANGUAGE plpgsql I googled and found a similar bug was reported for version 8.1 and was claimed to be fixed (http://archives.postgresql.org/pgsql-bugs/2006-12/msg00214.php).However, the PostgreSQL in my machine is 8.2.4, which issupposed to be free of the bug. Any suggestion will be greatly appreciated. _______________________________________________ Join Excite! - http://www.excite.com The most personalized portal on the Web!
"Shuo Liu" <delphet@excite.com> writes: > The log shows the following message: > CurTransactionContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used > ExecutorState: 122880 total in 4 blocks; 1912 free (9 chunks); 120968 used > ExprContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used > ExprContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used > ExprContext: 8192 total in 1 blocks; 8000 free (1 chunks); 192 used > ExprContext: 8192 total in 1 blocks; 8000 free (1 chunks); 192 used > ExprContext: 8192 total in 1 blocks; 8096 free (1 chunks); 96 used > SPI Exec: 0 total in 0 blocks; 0 free (0 chunks); 0 used > SPI Proc: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used > ExecutorState: 57344 total in 3 blocks; 35776 free (7 chunks); 21568 used > ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used The above is a fragment of a memory stats dump, which normally would only be emitted if you had an out-of-memory situation. However the part of it that you've shown doesn't indicate any particularly heavy memory usage. Were there any large numbers in the preceding similarly- formatted lines? Was there anything possibly relevant in the log entries before that? > 2007-07-10 12:25:57 LOG: server process (PID 2004) exited with exit code -1073741819 I suppose you're on Windows? This is what is currently printed for an Access Violation trap on Windows. The fact that it came out partway through a stats dump is pretty suspicious; it suggests that the trap happened as a result of trying to scan the memory context bookkeeping data, which implies a memory clobber of some sort. So you're looking at a bug, but there's much too little data here to guess what the bug is. Can you get a debugger stack trace? Or put together a self-contained test case for someone else to poke at? Actually the *first* thing to do is make sure you are up to date on both Postgres and PostGIS versions. No sense in spending a lot of time chasing down a bug if it's already been fixed. regards, tom lane
Hi, Tom, Thanks for the reply. I'll try to provide as much information as I can. > ExecutorState: 122880 total in 4 blocks; 1912 free (9 chunks); 120968 used > ExprContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used > ExprContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used > ExprContext: 8192 total in 1 blocks; 8000 free (1 chunks); 192 used > ExprContext: 8192 total in 1 blocks; 8000 free (1 chunks); 192 used > ExprContext: 8192 total in 1 blocks; 8096 free (1 chunks); 96 used > SPI Exec: 0 total in 0 blocks; 0 free (0 chunks); 0 used > SPI Proc: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used > ExecutorState: 57344 total in 3 blocks; 35776 free (7 chunks); 21568 used > ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used >The above is a fragment of a memory stats dump, which normally wouldonly be emitted if you had an out-of-memory situation. However thepart of it that you've shown doesn't indicate any particularly heavymemory usage. Were there any largenumbers in the preceding similarly-formatted lines? The lines that start with "ExecutorState" have similar big numbers and numbers in other lines are smaller. >Was there anything possibly relevant in the logentries before that? Logentries right after normal script output are listed below (those are just the ones on the top and the rest are similar): TopMemoryContext: 11550912 total in 1377 blocks; 123560 free (833 chunks); 11427352 used SPI Plan: 3072 total in 2 blocks; 888 free (0 chunks); 2184 used SPI Plan: 7168 total in 3 blocks; 3368 free (0 chunks); 3800 used SPI Plan: 1024 total in 1 blocks; 256 free (0 chunks); 768 used > 2007-07-10 12:25:57 LOG: server process (PID 2004) exited with exit code -1073741819I suppose you're on Windows? Yes, it is Windows XP SP2. Sorry that I didn't mention it before. >This is what is currently printed for an Access Violation trap on Windows. The fact that it came out partway through astats dump is pretty suspicious; it suggests that the trap happened as a result of trying to scan the memory context bookkeepingdata, which implies a memory clobber of some sort.So you're looking at a bug, but there's much too little datahere toguess what the bug is. Can you get a debugger stack trace? Or put together a self-contained test case for someoneelse to poke at? The spatial database that the script is using is quite large (about 4 GB). So I think making a self-contained test case wouldbe the last resort. I'll reset some parameters and try to generat debug info. I'll post it here as soon as it is available. >Actually the *first* thing to do is make sure you are up to date on both Postgres and PostGIS versions. No sense in spendinga lot of timechasing down a bug if it's already been fixed. The version of PostgreSQL is 8.2.4. PostGIS is 1.2.1 that is for PostgreSQL 8.2.4. So I think the installation is up to date. _______________________________________________ Join Excite! - http://www.excite.com The most personalized portal on the Web!
Hi, Tom, One more question. I'm new to PostgreSQL and not an expert in debugging. After checking the manual, I think I need to turnon the following parameters in order to generate debug info. Do you think doing so would give us what we need to pinpointthe problem? debug_assertions trace_notify trace_sort debug_print_parse debug_print_rewritten debug_print_plan debug_pretty_print Thanks. --- On Tue 07/10, Tom Lane < tgl@sss.pgh.pa.us > wrote: From: Tom Lane [mailto: tgl@sss.pgh.pa.us] To: delphet@excite.com Cc: pgsql-general@postgresql.org Date: Tue, 10 Jul 2007 20:04:41 -0400 Subject: Re: [GENERAL] exit code -1073741819 "Shuo Liu" <delphet@excite.com> writes:> The log shows the following message:> CurTransactionContext: 0 total in 0 blocks;0 free (0 chunks); 0 used> ExecutorState: 122880 total in 4 blocks; 1912 free (9 chunks); 120968 used> ExprContext:8192 total in 1 blocks; 8176 free (0 chunks); 16 used> ExprContext: 8192 total in 1 blocks; 8176 free (0 chunks);16 used> ExprContext: 8192 total in 1 blocks; 8000 free (1 chunks); 192 used> ExprContext: 8192 total in 1 blocks;8000 free (1 chunks); 192 used> ExprContext: 8192 total in 1 blocks; 8096 free (1 chunks); 96 used> SPI Exec: 0 totalin 0 blocks; 0 free (0 chunks); 0 used> SPI Proc: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used> ExecutorState:57344 total in 3 blocks; 35776 free (7 chunks); 21568 used> ExprContext: 0 total in 0 blocks; 0 free (0 chunks);0 usedThe above is a fragment of a memory stats dump, which normally wouldonly be emitted if you had an out-of-memorysituation. However thepart of it that you've shown doesn't indicate any particularly heavymemory usage. Were there any large numbers in the preceding similarly-formatted lines? Was there anything possibly relevant in the logentries before that?> 2007-07-10 12:25:57 LOG: server process (PID2004) exited with exit code -1073741819I suppose you're on Windows? This is what is currently printed for anAccess Violationtrap on Windows. The fact that it came out partwaythrough a stats dump is pretty suspicious; it suggests that thetraphappened as a result of trying to scan the memory context bookkeepingdata, which implies a memory clobber of somesort.So you're looking at a bug, but there's much too little data here toguess what the bug is. Can you get a debuggerstack trace? Or puttogether a self-contained test case for someone else to poke at?Actually the *first* thing todo is make sure you are up to date onboth Postgres and PostGIS versions. No sense in spending a lot of timechasing downa bug if it's already been fixed. regards, tom lane _______________________________________________ Join Excite! - http://www.excite.com The most personalized portal on the Web!
"Shuo Liu" <delphet@excite.com> writes: > TopMemoryContext: 11550912 total in 1377 blocks; 123560 free (833 chunks); 11427352 used Whoa ... that is a whole lot more data than I'm used to seeing in TopMemoryContext. How many stats dump lines are there exactly (from here to the crash report)? If there are many (like many thousands) that would explain why TopMemoryContext is so large, but would direct our attention to what's generating so many. If there are just a hundred or so, which'd be typical, then we are looking at some kind of leak of TopMemoryContext data --- though that still doesn't explain the crash. > The spatial database that the script is using is quite large (about 4 > GB). So I think making a self-contained test case would be the last > resort. FWIW, I doubt that the content of the database is the key point here; you could probably generate a test case with relatively little data, or maybe a lot of easily-created dummy data. However stripping it down might require more insight into the nature of the bug than we have at this point. regards, tom lane
"Shuo Liu" <delphet@excite.com> writes: > One more question. I'm new to PostgreSQL and not an expert in debugging. After checking the manual, I think I need to turnon the following parameters in order to generate debug info. Do you think doing so would give us what we need to pinpointthe problem? > debug_assertions > trace_notify > trace_sort > debug_print_parse > debug_print_rewritten > debug_print_plan > debug_pretty_print If you can turn on debug_assertions, do so and see if it catches anything; but it's not enabled in a standard build, and if it is enabled it's on by default, so I doubt there's any win there unless you're prepared to rebuild Postgres from source. The others seem unlikely to be helpful for this problem. What would probably be most helpful is a stack trace, but that's not easy to get on Windows ... which is why the shortest route to a fix may well be to create a test case that someone can exercise on a more debug-friendly platform. If you got a stack trace, and it wasn't conclusive, we'd have to go in that direction anyway. regards, tom lane
> Whoa ... that is a whole lot more data than I'm used to seeing in TopMemoryContext. How many stats dump lines are there exactly (from here to the crash report)? OK, I didn't know that was a surprise. There are about 600 stats dump lines in between. >> The spatial database that the script is using is quite large (about 4 >> GB). So I think making a self-contained test case would be the last >> resort. >FWIW, I doubt that the content of the database is the key point here; you could probably generate a test case with relatively little data, or maybe a lot of easily-created dummy data. However stripping it down might require more insight into the nature of the bug than we have at this point. I did a test in a small area (which is the street network in a county) before and it worked without crashing the server.In that test there were about 600 records (or addresses) to be processed while in the current case there are about12K records. Another difference is that the current case uses a larger base relation (the one I mentioned my previousemail) that covers a whole state. I'm not sure whether it is the amount of records to be processed or the size ofbase relation that causes the crash. _______________________________________________ Join Excite! - http://www.excite.com The most personalized portal on the Web!
"Shuo Liu" <delphet@excite.com> writes: >> Whoa ... that is a whole lot more data than I'm used to seeing in >> TopMemoryContext. How many stats dump lines are there exactly (from >> here to the crash report)? > OK, I didn't know that was a surprise. There are about 600 stats dump lines > in between. Hmm. 600 memory context records are not nearly enough to explain 11MB in TopMemoryContext ... but on the other hand I'm quite sure that your process crashed before printing all the entries (there's a pretty standard ordering to them and the last ones that should be there aren't). So we're still left with more questions than answers. >> FWIW, I doubt that the content of the database is the key point here; >> you could probably generate a test case with relatively little data, >> or maybe a lot of easily-created dummy data. However stripping it down >> might require more insight into the nature of the bug than we have at >> this point. > I did a test in a small area (which is the street network in a county) > before and it worked without crashing the server. OK, so maybe it's dependent on the size of the table. Try generating a test case by loading up just your schema + functions + a lot of dummy entries generated by script. Is the data proprietary? If not, maybe you could arrange to send me a dump off-list. A short test-case script would be better, but let's not waste huge amounts of person-time to save a few hours of downloading ... regards, tom lane
> OK, so maybe it's dependent on the size of the table. Try generating a test case by loading up just your schema + functions + a lot of dummy entries generated by script. > Is the data proprietary? If not, maybe you could arrange to send me a dump off-list. A short test-case script would be better, but let's not waste huge amounts of person-time to save a few hours of downloading ... That's what I was planning to do. I'll generate a table with dummy entries. I think we may try to use the smaller base tablefirst. Once I can reproduce the problem I'll dump the database into a file and send it to you. Is there a server thatI can upload the data to? Thanks again for your help. _______________________________________________ Join Excite! - http://www.excite.com The most personalized portal on the Web!
"Shuo Liu" <delphet@excite.com> writes: > That's what I was planning to do. I'll generate a table with dummy entries. I think we may try to use the smaller basetable first. Once I can reproduce the problem I'll dump the database into a file and send it to you. Is there a serverthat I can upload the data to? If what you come up with is too big for email (my threshold of pain for that is in the tens of MB, dunno about you), contact me off-list and we'll set something up. regards, tom lane
"Shuo Liu" <delphet@excite.com> writes: >> Whoa ... that is a whole lot more data than I'm used to seeing in >> TopMemoryContext. How many stats dump lines are there exactly (from >> here to the crash report)? > OK, I didn't know that was a surprise. There are about 600 stats dump lines > in between. Well, my suspicion was correct: there were boatloads of memory contexts, but we weren't seeing all of them. The answer is twofold: 1. Your function is opening a cursor (maybe more than one, I didn't read too carefully) on each iteration, and not closing it. This results in a sizable amount of memory being eaten --- it looks like a couple hundred K per cursor on my machine. Eventually you run out of memory. 2. PostGIS has a bug that causes a guaranteed crash once you run out of memory, if you've been using transform(), because it sets up a memory context that has no stats() method. That's why the context stats printout failed partway through --- it crashed as soon as it came to one of the PostGIS contexts. I've complained to the PostGIS folk about the latter, and hopefully they'll fix it in their next release. Meanwhile, you need to be more careful about closing cursors when you're done with them. regards, tom lane
Hi, Tom, Thank you so much for pointing out the problems. The first one seems very silly :-(. For the second one, I may stay out oftrouble if I don't use transform() at all (I can tolerate some error without transform), am I right? --- On Wed 07/11, Tom Lane < tgl@sss.pgh.pa.us > wrote: From: Tom Lane [mailto: tgl@sss.pgh.pa.us] To: delphet@excite.com Cc: pgsql-general@postgresql.org Date: Wed, 11 Jul 2007 15:33:41 -0400 Subject: Re: [GENERAL] exit code -1073741819 "Shuo Liu" <delphet@excite.com> writes:>> Whoa ... that is a whole lot more data than I'm used to seeing in>> TopMemoryContext.How many stats dump lines are there exactly (from>> here to the crash report)? > OK, I didn't know thatwas a surprise. There are about 600 stats dump lines > in between.Well, my suspicion was correct: there were boatloadsof memory contexts,but we weren't seeing all of them. The answer is twofold:1. Your function is opening a cursor(maybe more than one, I didn't readtoo carefully) on each iteration, and not closing it. This results ina sizableamount of memory being eaten --- it looks like a couplehundred K per cursor on my machine. Eventually you run outof memory.2. PostGIS has a bug that causes a guaranteed crash once you run out ofmemory, if you've been using transform(),because it sets up a memorycontext that has no stats() method. That's why the context statsprintout failed partwaythrough --- it crashed as soon as it came toone of the PostGIS contexts.I've complained to the PostGIS folk about the latter, and hopefullythey'll fix it in their next release. Meanwhile, you need to be morecareful about closing cursors when you're done with them. regards, tomlane---------------------------(end of broadcast)---------------------------TIP 9: In versions below 8.0, the plannerwill ignore your desire to choose an index scan if your joining column's datatypes do not match _______________________________________________ Join Excite! - http://www.excite.com The most personalized portal on the Web!
"Shuo Liu" <delphet@excite.com> writes: > Thank you so much for pointing out the problems. The first one seems very silly :-(. For the second one, I may stay outof trouble if I don't use transform() at all (I can tolerate some error without transform), am I right? Hmm, I wouldn't suggest removing functionality to work around a bug that presumably will get fixed soon. It's not that common to run out of memory (else this bug would've been found long ago), so I think you could just live with it. regards, tom lane