Thread: exit code -1073741819

exit code -1073741819

From
"Shuo Liu"
Date:
Hi, All,

I'm working on a GIS project using PostgreSQL and PostGIS. In the project I need to find locations of about 12K
addresses(the process is referred to as geocoding). I wrote some script to perform this task by calling a procedure
"tiger_geocoding"that is provided by PostGIS. My script seems to crash the server after a while with the following
message:


server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
connection to server was lost


The log shows the following message:


CurTransactionContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExecutorState: 122880 total in 4 blocks; 1912 free (9 chunks); 120968 used
ExprContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used
ExprContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used
ExprContext: 8192 total in 1 blocks; 8000 free (1 chunks); 192 used
ExprContext: 8192 total in 1 blocks; 8000 free (1 chunks); 192 used
ExprContext: 8192 total in 1 blocks; 8096 free (1 chunks); 96 used
SPI Exec: 0 total in 0 blocks; 0 free (0 chunks); 0 used
SPI Proc: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used
ExecutorState: 57344 total in 3 blocks; 35776 free (7 chunks); 21568 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
2007-07-10 12:25:57 LOG:  server process (PID 2004) exited with exit code -1073741819
2007-07-10 12:25:57 LOG:  terminating any other active server processes
2007-07-10 12:25:59 WARNING:  terminating connection because of crash of another server process


My script is listed below:

CREATE OR REPLACE FUNCTION geocode_fall2006 () RETURNS VOID AS $$
DECLARE
    address VARCHAR(255);
    geom GEOMETRY;
    tmp_geom GEOMETRY;
    counter INTEGER := 0;
    app_id VARCHAR(50);
    st VARCHAR(50);
    f_processed BOOLEAN;
    f_geocoded BOOLEAN;
BEGIN
    FOR app_id, address, st, f_processed, f_geocoded IN SELECT APPLICATION_ID, add, state, geoprocessed, geocoded FROM
fall2006LOOP 
        RAISE NOTICE 'add=%, app_id=%, state=%', address, app_id, st;
        IF upper(st)='OH' AND f_processed='f'
        THEN
            geom := geocode_test(address);
            IF geom IS NOT NULL
            THEN
                counter := counter + 1;
                -- TIGER SRID is 32767.
                tmp_geom := PointFromText('POINT(' || X(geom) || ' ' || Y(geom) || ')',4269);
                tmp_geom := transform(tmp_geom,4326);
                -- id := SRID(tmp_geom);
                -- RAISE NOTICE 'srid=%', id;
                UPDATE fall2006 SET lat_lon = tmp_geom WHERE APPLICATION_ID = app_id;
                UPDATE fall2006 SET geocoded = 't' WHERE APPLICATION_ID = app_id;
                RAISE NOTICE 'UPDATE fall2006 SET lat_lon = % WHERE APPLICATION_ID = %;', AsText(tmp_geom), app_id;
            END IF;
        UPDATE fall2006 SET geoprocessed = 't' WHERE APPLICATION_ID = app_id;
        END IF;
    END LOOP;
    RAISE NOTICE 'counter=%', counter;
END;
$$ LANGUAGE plpgsql


I googled and found a similar bug was reported for version 8.1 and was claimed to be fixed
(http://archives.postgresql.org/pgsql-bugs/2006-12/msg00214.php).However, the PostgreSQL in my machine is 8.2.4, which
issupposed to be free of the bug. Any suggestion will be greatly appreciated. 


_______________________________________________
Join Excite! - http://www.excite.com
The most personalized portal on the Web!



Re: exit code -1073741819

From
Tom Lane
Date:
"Shuo Liu" <delphet@excite.com> writes:
> The log shows the following message:

> CurTransactionContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
> ExecutorState: 122880 total in 4 blocks; 1912 free (9 chunks); 120968 used
> ExprContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used
> ExprContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used
> ExprContext: 8192 total in 1 blocks; 8000 free (1 chunks); 192 used
> ExprContext: 8192 total in 1 blocks; 8000 free (1 chunks); 192 used
> ExprContext: 8192 total in 1 blocks; 8096 free (1 chunks); 96 used
> SPI Exec: 0 total in 0 blocks; 0 free (0 chunks); 0 used
> SPI Proc: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used
> ExecutorState: 57344 total in 3 blocks; 35776 free (7 chunks); 21568 used
> ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used

The above is a fragment of a memory stats dump, which normally would
only be emitted if you had an out-of-memory situation.  However the
part of it that you've shown doesn't indicate any particularly heavy
memory usage.  Were there any large numbers in the preceding similarly-
formatted lines?  Was there anything possibly relevant in the log
entries before that?

> 2007-07-10 12:25:57 LOG:  server process (PID 2004) exited with exit code -1073741819

I suppose you're on Windows?  This is what is currently printed for an
Access Violation trap on Windows.  The fact that it came out partway
through a stats dump is pretty suspicious; it suggests that the trap
happened as a result of trying to scan the memory context bookkeeping
data, which implies a memory clobber of some sort.

So you're looking at a bug, but there's much too little data here to
guess what the bug is.  Can you get a debugger stack trace?  Or put
together a self-contained test case for someone else to poke at?

Actually the *first* thing to do is make sure you are up to date on
both Postgres and PostGIS versions.  No sense in spending a lot of time
chasing down a bug if it's already been fixed.

            regards, tom lane

Re: exit code -1073741819

From
"Shuo Liu"
Date:
Hi, Tom,

Thanks for the reply. I'll try to provide as much information as I can.


> ExecutorState: 122880 total in 4 blocks; 1912 free (9 chunks); 120968 used
> ExprContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used
> ExprContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used
> ExprContext: 8192 total in 1 blocks; 8000 free (1 chunks); 192 used
> ExprContext: 8192 total in 1 blocks; 8000 free (1 chunks); 192 used
> ExprContext: 8192 total in 1 blocks; 8096 free (1 chunks); 96 used
> SPI Exec: 0 total in 0 blocks; 0 free (0 chunks); 0 used
> SPI Proc: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used
> ExecutorState: 57344 total in 3 blocks; 35776 free (7 chunks); 21568 used
> ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
>The above is a fragment of a memory stats dump, which normally wouldonly be emitted if you had an out-of-memory
situation. However thepart of it that you've shown doesn't indicate any particularly heavymemory usage.  Were there any
largenumbers in the preceding similarly-formatted lines?   


The lines that start with "ExecutorState" have similar big numbers and numbers in other lines are smaller.


>Was there anything possibly relevant in the logentries before that?


Logentries right after normal script output are listed below (those are just the ones on the top and the rest are
similar):


TopMemoryContext: 11550912 total in 1377 blocks; 123560 free (833 chunks); 11427352 used
SPI Plan: 3072 total in 2 blocks; 888 free (0 chunks); 2184 used
SPI Plan: 7168 total in 3 blocks; 3368 free (0 chunks); 3800 used
SPI Plan: 1024 total in 1 blocks; 256 free (0 chunks); 768 used


> 2007-07-10 12:25:57 LOG:  server process (PID 2004) exited with exit code -1073741819I suppose you're on Windows?

Yes, it is Windows XP SP2. Sorry that I didn't mention it before.


>This is what is currently printed for an Access Violation trap on Windows.  The fact that it came out partway through
astats dump is pretty suspicious; it suggests that the trap happened as a result of trying to scan the memory context
bookkeepingdata, which implies a memory clobber of some sort.So you're looking at a bug, but there's much too little
datahere toguess what the bug is.  Can you get a debugger stack trace?  Or put together a self-contained test case for
someoneelse to poke at? 

The spatial database that the script is using is quite large (about 4 GB). So I think making a self-contained test case
wouldbe the last resort. I'll reset some parameters and try to generat debug info. I'll post it here as soon as it is
available.

>Actually the *first* thing to do is make sure you are up to date on both Postgres and PostGIS versions.  No sense in
spendinga lot of timechasing down a bug if it's already been fixed.     

The version of PostgreSQL is 8.2.4. PostGIS is 1.2.1 that is for PostgreSQL 8.2.4. So I think the installation is up to
date.

_______________________________________________
Join Excite! - http://www.excite.com
The most personalized portal on the Web!



Re: exit code -1073741819

From
"Shuo Liu"
Date:
Hi, Tom,

One more question. I'm new to PostgreSQL and not an expert in debugging. After checking the manual, I think I need to
turnon the following parameters in order to generate debug info. Do you think doing so would give us what we need to
pinpointthe problem? 

debug_assertions
trace_notify
trace_sort
debug_print_parse
debug_print_rewritten
debug_print_plan
debug_pretty_print

Thanks.



 --- On Tue 07/10, Tom Lane < tgl@sss.pgh.pa.us > wrote:
From: Tom Lane [mailto: tgl@sss.pgh.pa.us]
To: delphet@excite.com
     Cc: pgsql-general@postgresql.org
Date: Tue, 10 Jul 2007 20:04:41 -0400
Subject: Re: [GENERAL] exit code -1073741819

"Shuo Liu" <delphet@excite.com> writes:> The log shows the following message:> CurTransactionContext: 0 total in 0
blocks;0 free (0 chunks); 0 used> ExecutorState: 122880 total in 4 blocks; 1912 free (9 chunks); 120968 used>
ExprContext:8192 total in 1 blocks; 8176 free (0 chunks); 16 used> ExprContext: 8192 total in 1 blocks; 8176 free (0
chunks);16 used> ExprContext: 8192 total in 1 blocks; 8000 free (1 chunks); 192 used> ExprContext: 8192 total in 1
blocks;8000 free (1 chunks); 192 used> ExprContext: 8192 total in 1 blocks; 8096 free (1 chunks); 96 used> SPI Exec: 0
totalin 0 blocks; 0 free (0 chunks); 0 used> SPI Proc: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used>
ExecutorState:57344 total in 3 blocks; 35776 free (7 chunks); 21568 used> ExprContext: 0 total in 0 blocks; 0 free (0
chunks);0 usedThe above is a fragment of a memory stats dump, which normally wouldonly be emitted if you had an
out-of-memorysituation.  However thepart of it that you've shown 

doesn't indicate any particularly heavymemory usage.  Were there any large numbers in the preceding similarly-formatted
lines? Was there anything possibly relevant in the logentries before that?> 2007-07-10 12:25:57 LOG:  server process
(PID2004) exited with exit code -1073741819I suppose you're on Windows?  This is what is currently printed for anAccess
Violationtrap on Windows.  The fact that it came out partwaythrough a stats dump is pretty suspicious; it suggests that
thetraphappened as a result of trying to scan the memory context bookkeepingdata, which implies a memory clobber of
somesort.So you're looking at a bug, but there's much too little data here toguess what the bug is.  Can you get a
debuggerstack trace?  Or puttogether a self-contained test case for someone else to poke at?Actually the *first* thing
todo is make sure you are up to date onboth Postgres and PostGIS versions.  No sense in spending a lot of timechasing
downa bug if it's already been fixed. 

        regards, tom lane

_______________________________________________
Join Excite! - http://www.excite.com
The most personalized portal on the Web!



Re: exit code -1073741819

From
Tom Lane
Date:
"Shuo Liu" <delphet@excite.com> writes:
> TopMemoryContext: 11550912 total in 1377 blocks; 123560 free (833 chunks); 11427352 used

Whoa ... that is a whole lot more data than I'm used to seeing in
TopMemoryContext.  How many stats dump lines are there exactly (from
here to the crash report)?  If there are many (like many thousands) that
would explain why TopMemoryContext is so large, but would direct our
attention to what's generating so many.  If there are just a hundred or
so, which'd be typical, then we are looking at some kind of leak of
TopMemoryContext data --- though that still doesn't explain the crash.

> The spatial database that the script is using is quite large (about 4
> GB). So I think making a self-contained test case would be the last
> resort.

FWIW, I doubt that the content of the database is the key point here;
you could probably generate a test case with relatively little data,
or maybe a lot of easily-created dummy data.  However stripping it down
might require more insight into the nature of the bug than we have at
this point.

            regards, tom lane

Re: exit code -1073741819

From
Tom Lane
Date:
"Shuo Liu" <delphet@excite.com> writes:
> One more question. I'm new to PostgreSQL and not an expert in debugging. After checking the manual, I think I need to
turnon the following parameters in order to generate debug info. Do you think doing so would give us what we need to
pinpointthe problem? 

> debug_assertions
> trace_notify
> trace_sort
> debug_print_parse
> debug_print_rewritten
> debug_print_plan
> debug_pretty_print

If you can turn on debug_assertions, do so and see if it catches
anything; but it's not enabled in a standard build, and if it is enabled
it's on by default, so I doubt there's any win there unless you're
prepared to rebuild Postgres from source.  The others seem unlikely to
be helpful for this problem.

What would probably be most helpful is a stack trace, but that's not
easy to get on Windows ... which is why the shortest route to a fix
may well be to create a test case that someone can exercise on a more
debug-friendly platform.  If you got a stack trace, and it wasn't
conclusive, we'd have to go in that direction anyway.

            regards, tom lane

Re: exit code -1073741819

From
"Shuo Liu"
Date:
> Whoa ... that is a whole lot more data than I'm used to seeing in
TopMemoryContext. How many stats dump lines are there exactly (from
here to the crash report)?

OK, I didn't know that was a surprise. There are about 600 stats dump lines
in between.


>> The spatial database that the script is using is quite large (about 4
>> GB). So I think making a self-contained test case would be the last
>> resort.

>FWIW, I doubt that the content of the database is the key point here;
you could probably generate a test case with relatively little data,
or maybe a lot of easily-created dummy data. However stripping it down
might require more insight into the nature of the bug than we have at
this point.

I did a test in a small area (which is the street network in a county) before and it worked without crashing the
server.In that test there were about 600 records (or addresses) to be processed while in the current case there are
about12K records. Another difference is that the current case uses a larger base relation (the one I mentioned my
previousemail) that covers a whole state. I'm not sure whether it is the amount of records to be processed or the size
ofbase relation that causes the crash. 

_______________________________________________
Join Excite! - http://www.excite.com
The most personalized portal on the Web!



Re: exit code -1073741819

From
Tom Lane
Date:
"Shuo Liu" <delphet@excite.com> writes:
>> Whoa ... that is a whole lot more data than I'm used to seeing in
>> TopMemoryContext. How many stats dump lines are there exactly (from
>> here to the crash report)?

> OK, I didn't know that was a surprise. There are about 600 stats dump lines
> in between.

Hmm.  600 memory context records are not nearly enough to explain
11MB in TopMemoryContext ... but on the other hand I'm quite sure that
your process crashed before printing all the entries (there's a pretty
standard ordering to them and the last ones that should be there
aren't).  So we're still left with more questions than answers.

>> FWIW, I doubt that the content of the database is the key point here;
>> you could probably generate a test case with relatively little data,
>> or maybe a lot of easily-created dummy data. However stripping it down
>> might require more insight into the nature of the bug than we have at
>> this point.

> I did a test in a small area (which is the street network in a county)
> before and it worked without crashing the server.

OK, so maybe it's dependent on the size of the table.  Try generating a
test case by loading up just your schema + functions + a lot of dummy
entries generated by script.

Is the data proprietary?  If not, maybe you could arrange to send me a
dump off-list.  A short test-case script would be better, but let's not
waste huge amounts of person-time to save a few hours of downloading ...

            regards, tom lane

Re: exit code -1073741819

From
"Shuo Liu"
Date:
> OK, so maybe it's dependent on the size of the table. Try generating a
test case by loading up just your schema + functions + a lot of dummy
entries generated by script.
> Is the data proprietary? If not, maybe you could arrange to send me a
dump off-list. A short test-case script would be better, but let's not
waste huge amounts of person-time to save a few hours of downloading ...

That's what I was planning to do. I'll generate a table with dummy entries. I think we may try to use the smaller base
tablefirst. Once I can reproduce the problem I'll dump the database into a file and send it to you. Is there a server
thatI can upload the data to?  

Thanks again for your help.

_______________________________________________
Join Excite! - http://www.excite.com
The most personalized portal on the Web!



Re: exit code -1073741819

From
Tom Lane
Date:
"Shuo Liu" <delphet@excite.com> writes:
> That's what I was planning to do. I'll generate a table with dummy entries. I think we may try to use the smaller
basetable first. Once I can reproduce the problem I'll dump the database into a file and send it to you. Is there a
serverthat I can upload the data to?  

If what you come up with is too big for email (my threshold of pain for
that is in the tens of MB, dunno about you), contact me off-list and
we'll set something up.

            regards, tom lane

Re: exit code -1073741819

From
Tom Lane
Date:
"Shuo Liu" <delphet@excite.com> writes:
>> Whoa ... that is a whole lot more data than I'm used to seeing in
>> TopMemoryContext. How many stats dump lines are there exactly (from
>> here to the crash report)?

> OK, I didn't know that was a surprise. There are about 600 stats dump lines
> in between.

Well, my suspicion was correct: there were boatloads of memory contexts,
but we weren't seeing all of them.  The answer is twofold:

1. Your function is opening a cursor (maybe more than one, I didn't read
too carefully) on each iteration, and not closing it.  This results in
a sizable amount of memory being eaten --- it looks like a couple
hundred K per cursor on my machine.  Eventually you run out of memory.

2. PostGIS has a bug that causes a guaranteed crash once you run out of
memory, if you've been using transform(), because it sets up a memory
context that has no stats() method.  That's why the context stats
printout failed partway through --- it crashed as soon as it came to
one of the PostGIS contexts.

I've complained to the PostGIS folk about the latter, and hopefully
they'll fix it in their next release.  Meanwhile, you need to be more
careful about closing cursors when you're done with them.

            regards, tom lane

Re: exit code -1073741819

From
"Shuo Liu"
Date:
Hi, Tom,

Thank you so much for pointing out the problems. The first one seems very silly :-(. For the second one, I may stay out
oftrouble if I don't use transform() at all (I can tolerate some error without transform), am I right?  




 --- On Wed 07/11, Tom Lane < tgl@sss.pgh.pa.us > wrote:
From: Tom Lane [mailto: tgl@sss.pgh.pa.us]
To: delphet@excite.com
     Cc: pgsql-general@postgresql.org
Date: Wed, 11 Jul 2007 15:33:41 -0400
Subject: Re: [GENERAL] exit code -1073741819

"Shuo Liu" <delphet@excite.com> writes:>> Whoa ... that is a whole lot more data than I'm used to seeing in>>
TopMemoryContext.How many stats dump lines are there exactly (from>> here to the crash report)? > OK, I didn't know
thatwas a surprise. There are about 600 stats dump lines > in between.Well, my suspicion was correct: there were
boatloadsof memory contexts,but we weren't seeing all of them.  The answer is twofold:1. Your function is opening a
cursor(maybe more than one, I didn't readtoo carefully) on each iteration, and not closing it.  This results ina
sizableamount of memory being eaten --- it looks like a couplehundred K per cursor on my machine.  Eventually you run
outof memory.2. PostGIS has a bug that causes a guaranteed crash once you run out ofmemory, if you've been using
transform(),because it sets up a memorycontext that has no stats() method.  That's why the context statsprintout failed
partwaythrough --- it crashed as soon as it came toone of the  
PostGIS contexts.I've complained to the PostGIS folk about the latter, and hopefullythey'll fix it in their next
release. Meanwhile, you need to be morecareful about closing cursors when you're done with them.            regards,
tomlane---------------------------(end of broadcast)---------------------------TIP 9: In versions below 8.0, the
plannerwill ignore your desire to       choose an index scan if your joining column's datatypes do not       match 

_______________________________________________
Join Excite! - http://www.excite.com
The most personalized portal on the Web!



Re: exit code -1073741819

From
Tom Lane
Date:
"Shuo Liu" <delphet@excite.com> writes:
> Thank you so much for pointing out the problems. The first one seems very silly :-(. For the second one, I may stay
outof trouble if I don't use transform() at all (I can tolerate some error without transform), am I right?  

Hmm, I wouldn't suggest removing functionality to work around a bug that
presumably will get fixed soon.  It's not that common to run out of
memory (else this bug would've been found long ago), so I think you
could just live with it.

            regards, tom lane