Thread: Debugging guidance

Debugging guidance

From
"Rich Garabedian"
Date:
Hello All,

I've inherited a fairly complex database that makes heavy use of plpgsql.
Recently, this database has been crashing at random times when it tries to
execute a specific plpgsql function. I have lots of print statements
sprinkled throughout the plpgsq function and the crash seems to happen at
random places for no apparent reason. According to my print statements, the
crash sometimes occurs when the function has succesfully exited! So it may
not be related to the function at all. Because of this apparent randomness,
I can not reliably duplicate the crash.

I'm very new to postgreSQL, and I'm not much of database guru either. Can
anyone suggest further debugging avenues?

                           version
-------------------------------------------------------------
 PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96

This is the log output printed when the database does crash:

2002-10-24 14:09:33 [5737]   DEBUG:  server process (pid 6551) was
terminated by signal 11
2002-10-24 14:09:33 [5737]   DEBUG:  terminating any other active server
processes
2002-10-24 14:09:33 [5737]   DEBUG:  all server processes terminated;
reinitializing shared memory and semaphores
2002-10-24 14:09:33 [6552]   DEBUG:  database system was interrupted at
2002-10-24 14:09:25 EDT
2002-10-24 14:09:33 [6552]   DEBUG:  checkpoint record is at 2C/68264F00
2002-10-24 14:09:33 [6552]   DEBUG:  redo record is at 2C/68264F00; undo
record is at 0/0; shutdown TRUE
2002-10-24 14:09:33 [6552]   DEBUG:  next transaction id: 43454478; next
oid: 411378833
2002-10-24 14:09:33 [6552]   DEBUG:  database system was not properly shut
down; automatic recovery in progress
2002-10-24 14:09:33 [6552]   DEBUG:  redo starts at 2C/68264F40
2002-10-24 14:09:33 [6552]   DEBUG:  ReadRecord: record with zero length at
2C/68268FC8
2002-10-24 14:09:33 [6552]   DEBUG:  redo done at 2C/68264F40

Thanks very much,

Rich




Re: Debugging guidance

From
Stephan Szabo
Date:
On Fri, 25 Oct 2002, Rich Garabedian wrote:

> Hello All,
>
> I've inherited a fairly complex database that makes heavy use of plpgsql.
> Recently, this database has been crashing at random times when it tries to
> execute a specific plpgsql function. I have lots of print statements
> sprinkled throughout the plpgsq function and the crash seems to happen at
> random places for no apparent reason. According to my print statements, the
> crash sometimes occurs when the function has succesfully exited! So it may
> not be related to the function at all. Because of this apparent randomness,
> I can not reliably duplicate the crash.
>
> I'm very new to postgreSQL, and I'm not much of database guru either. Can
> anyone suggest further debugging avenues?

Well, make sure that you've got your ulimit set to allow core files and
then you should have a core in the database directory which you can
use to get a backtrace from gdb.  If you don't get anything, you may need
to compile with debugging and provoke the error again.


Re: Debugging guidance

From
Tom Lane
Date:
"Rich Garabedian" <richg@mav-mail.com> writes:
> I've inherited a fairly complex database that makes heavy use of plpgsql.
> Recently, this database has been crashing at random times when it tries to
> execute a specific plpgsql function.

Ugh.

> I'm very new to postgreSQL, and I'm not much of database guru either. Can
> anyone suggest further debugging avenues?

1. Update to PG 7.2.3.  We don't put out dot-releases without good
reason.  (It's entirely possible that your problem is just the
plpgsql_dstring_append bug fixed in 7.2.2.)  If you still see the
problem, then:

2. Build the new installation with --enable-debug, and maybe also
--enable-cassert.  (--enable-debug is harmless to leave on in
production, except for wasting a couple of megabytes of disk space.
--enable-cassert will slow things down a trifle, so you'd only want
to use it when chasing a problem, IMHO.)  I am not sure whether any
RPMs come built this way; you may need to build from source.
(Anyone know if an SRPM can easily be built with --enable-debug added?)

3. Make sure the postmaster is started with "ulimit -c unlimited";
this is not the default on a lot of Linux distros.  This way, you'll
get a core-dump file in the database subdirectory when a backend
crash occurs.

4. Use gdb to get a stack backtrace from the core file, and send it
to pgsql-hackers for more help.

            regards, tom lane

Inserting and Updating

From
"Douglas Blood"
Date:
I have been looking for a function that trys to update a table based on some
criteria and if there are 0 rows updated to insert a row. If anyone knows a
way to do this or has a function that they would let me use I would
appriciate it.
Douglas Blood
Matraex Inc.