Thread: Trouble with plpgsql on 7.4.6

Trouble with plpgsql on 7.4.6

From
"D'Arcy J.M. Cain"
Date:
Has anyone else had any problem installing plpgsql into a 7.4.6
database?  When I try (on NetBSD 2.0RC5) I get the following message:

createlang: language installation failed: server closed the connection
unexpectedly

This does not happen on 7.4.3, the other system that I have installed. 
It doesn't happen if I use 7.4.6 to access a database built by 7.4.3. 
It happens on 7.4.6 if I initdb a new database and simply run the
createlang command against it.  Here is a test script which I ran in a
scratch directory.

initdb .
pg_ctl -D . start
sleep 3
createlang plpgsql template1

The closest thing I could find in the archives is the message about
7.4.5 in
http://archives.postgresql.org/pgsql-hackers/2004-09/msg00915.php.  I
don't think there is a connection but I could be wrong.

I turned on a bunch of logging and this is what gets spit out into the
log:

Nov 18 11:49:18 panther cert198[21983]: [47-1] LOG:  00000: server
process (PID 7864) was terminated by signal 6
Nov 18 11:49:18 panther cert198[21983]: [47-2] LOCATION:  LogChildExit,
postmaster.c:2087
Nov 18 11:49:18 panther cert198[21983]: [48-1] LOG:  00000: terminating
any other active server processes
Nov 18 11:49:18 panther cert198[21983]: [48-2] LOCATION:  CleanupProc,
postmaster.c:2008
Nov 18 11:49:18 panther cert198[21983]: [49-1] LOG:  00000: all server
processes terminated; reinitializing
Nov 18 11:49:18 panther cert198[21983]: [49-2] LOCATION:  reaper,
postmaster.c:1920
Nov 18 11:49:18 panther cert198[1332]: [52-1] LOG:  00000: database
system was interrupted at 2004-11-18 11:43:53 EST
Nov 18 11:49:18 panther cert198[1332]: [52-2] LOCATION:  StartupXLOG,
xlog.c:2610
Nov 18 11:49:18 panther cert198[1332]: [53-1] LOG:  00000: checkpoint
record is at 0/9F5B40
Nov 18 11:49:18 panther cert198[1332]: [53-2] LOCATION:  StartupXLOG,
xlog.c:2628
Nov 18 11:49:18 panther cert198[1332]: [54-1] LOG:  00000: redo record
is at 0/9F5B40; undo record is at 0/0; shutdown TRUE
Nov 18 11:49:18 panther cert198[1332]: [54-2] LOCATION:  StartupXLOG,
xlog.c:2653
Nov 18 11:49:18 panther cert198[1332]: [55-1] LOG:  00000: next
transaction ID: 574; next OID: 74486
Nov 18 11:49:18 panther cert198[1332]: [55-2] LOCATION:  StartupXLOG,
xlog.c:2656
Nov 18 11:49:18 panther cert198[1332]: [56-1] LOG:  00000: database
system was not properly shut down; automatic recovery in progress
Nov 18 11:49:18 panther cert198[1332]: [56-2] LOCATION:  StartupXLOG,
xlog.c:2705
Nov 18 11:49:19 panther cert198[1332]: [57-1] LOG:  00000: redo starts
at 0/9F5B80
Nov 18 11:49:19 panther cert198[1332]: [57-2] LOCATION:  StartupXLOG,
xlog.c:2733
Nov 18 11:49:19 panther cert198[1332]: [58-1] LOG:  00000: invalid magic
number 0000 in log file 0, segment 0, offset 10477568
Nov 18 11:49:19 panther cert198[1332]: [58-2] LOCATION: 
ValidXLOGHeader, xlog.c:2048
Nov 18 11:49:19 panther cert198[1332]: [59-1] LOG:  00000: redo done at
0/9FBCAC
Nov 18 11:49:19 panther cert198[1332]: [59-2] LOCATION:  StartupXLOG,
xlog.c:2765
Nov 18 11:49:21 panther cert198[1332]: [60-1] LOG:  00000: database
system is ready
Nov 18 11:49:21 panther cert198[1332]: [60-2] LOCATION:  StartupXLOG,
xlog.c:2946

-- 
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: Trouble with plpgsql on 7.4.6

From
Tom Lane
Date:
"D'Arcy J.M. Cain" <darcy@druid.net> writes:
> Has anyone else had any problem installing plpgsql into a 7.4.6
> database?  When I try (on NetBSD 2.0RC5) I get the following message:

> createlang: language installation failed: server closed the connection
> unexpectedly

Probably indicates a failure to load the plpgsql shared library.

> I turned on a bunch of logging and this is what gets spit out into the
> log:

> Nov 18 11:49:18 panther cert198[21983]: [47-1] LOG:  00000: server
> process (PID 7864) was terminated by signal 6

There wasn't anything interesting just before that?

Signal 6 is SIGABRT (at least on my machine) so something decided to
abort() in that backend.  Postgres would abort() on assertion failure,
but it would have logged a message first.  My guess is that the shared
library loader did the abort(), and probably it griped on stderr rather
than to syslog.  Fix your postmaster startup arrangement so that stderr
goes someplace useful instead of to /dev/null, so you can see what it's
complaining about.
        regards, tom lane


Re: Trouble with plpgsql on 7.4.6

From
"D'Arcy J.M. Cain"
Date:
On Thu, 18 Nov 2004 14:25:41 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "D'Arcy J.M. Cain" <darcy@druid.net> writes:
> > Has anyone else had any problem installing plpgsql into a 7.4.6
> > database?  When I try (on NetBSD 2.0RC5) I get the following
> > message:
>
> > createlang: language installation failed: server closed the
> > connection unexpectedly
>
> Probably indicates a failure to load the plpgsql shared library.

I considered that but the file is where I expect it to be.  I did this:

[pgsql@panther:/data/cert198] $ pg_config --libdir
/usr/pkg/lib

The file is actually in /usr/pkg/lib/postgresql/plpgsql.so.  I wasn't
sure if the postgresql was added so I made a symlink to assure that it
was available in both. locations.

> > I turned on a bunch of logging and this is what gets spit out into
> > the log:
>
> > Nov 18 11:49:18 panther cert198[21983]: [47-1] LOG:  00000: server
> > process (PID 7864) was terminated by signal 6
>
> There wasn't anything interesting just before that?

Not that I can see.  I have attached the entire output of the capture
from the start.  Here are the diffs from the postgresql.conf created by
initdb.

--- postgresql.conf.orig        2004-11-18 15:33:03.000000000 -0500
+++ postgresql.conf     2004-11-18 15:33:17.000000000 -0500
@@ -136,24 +136,24 @@

 # - Syslog -

-#syslog = 0                    # range 0-2; 0=stdout; 1=both; 2=syslog
-#syslog_facility = 'LOCAL0'
-#syslog_ident = 'postgres'
+syslog = 1                     # range 0-2; 0=stdout; 1=both; 2=syslog
+syslog_facility = 'LOCAL4'
+syslog_ident = 'postgres'

 # - When to Log -

-#client_min_messages = notice  # Values, in order of decreasing detail:
+client_min_messages = debug5   # Values, in order of decreasing detail:
                                #   debug5, debug4, debug3, debug2,
debug1,
                                #   log, info, notice, warning, error

-#log_min_messages = notice     # Values, in order of decreasing detail:
+log_min_messages = debug5      # Values, in order of decreasing detail:
                                #   debug5, debug4, debug3, debug2,
debug1,
                                #   info, notice, warning, error, log,
fatal,
                                #   panic

-#log_error_verbosity = default   # terse, default, or verbose messages
+log_error_verbosity = verbose   # terse, default, or verbose messages

-#log_min_error_statement = panic # Values in order of increasing
severity:
+log_min_error_statement = debug5 # Values in order of increasing
severity:
                                 #   debug5, debug4, debug3, debug2,
debug1,
                                 #   info, notice, warning, error,
panic(off)

@@ -162,7 +162,7 @@
                                 # milliseconds.  Zero prints all
queries.
                                 # Minus-one disables.

-#silent_mode = false            # DO NOT USE without Syslog!
+silent_mode = false             # DO NOT USE without Syslog!

 # - What to Log -

@@ -170,11 +170,11 @@
 #debug_print_rewritten = false
 #debug_print_plan = false
 #debug_pretty_print = false
-#log_connections = false
-#log_duration = false
-#log_pid = false
-#log_statement = false
-#log_timestamp = false
+log_connections = true
+log_duration = true
+log_pid = true
+log_statement = true
+log_timestamp = true
 #log_hostname = false
 #log_source_port = false

> Signal 6 is SIGABRT (at least on my machine) so something decided to

Same on mine.

Thanks for the help.

--
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.

Attachment

Re: Trouble with plpgsql on 7.4.6

From
Tom Lane
Date:
"D'Arcy J.M. Cain" <darcy@druid.net> writes:
> I thought that this would have sent everything to both the log and the
> screen but I found that the syslog has much more detail.  I have
> attached that output.

We still need to look at the stderr output.  All this says is that
you're getting repeated abort()s.  It's unlikely that anything would
be coded to abort() without emitting any gripe at all --- but the gripe
is not appearing in syslog output, so stderr is the next place to look.

Also see about getting a stack trace from one of the core dumps.
        regards, tom lane


Re: Trouble with plpgsql on 7.4.6

From
"D'Arcy J.M. Cain"
Date:
On Sat, 20 Nov 2004 11:17:48 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "D'Arcy J.M. Cain" <darcy@druid.net> writes:
> > I thought that this would have sent everything to both the log and
> > the screen but I found that the syslog has much more detail.  I have
> > attached that output.
> 
> We still need to look at the stderr output.  All this says is that
> you're getting repeated abort()s.  It's unlikely that anything would
> be coded to abort() without emitting any gripe at all --- but the
> gripe is not appearing in syslog output, so stderr is the next place
> to look.

The stderr was in the previous message.  No gripes there either other
than in the startup after the failure.

> Also see about getting a stack trace from one of the core dumps.

I did look at the core file and here is what I saw:

#0  0x483cafeb in kill () from /usr/lib/libc.so.12
#1  0x483cd0af in __libc_mutex_catchall_stub (m=1212478892)   at /usr/src/lib/libc/thread-stub/thread-stub.c:112
#2  0x4843f0f7 in free (ptr=<incomplete type>)   at /usr/src/lib/libc/stdlib/malloc.c:1149
#3  0x081b3efc in AllocSetDelete (context=<error type>) at aset.c:464
#4  0x081b468a in MemoryContextDelete (context=<error type>) at
#mcxt.c:192

In fact it is calling the thread stubs but the applicatino is threaded. 
At least, that's what I see from this code in NetBSD.

#define CHECK_NOT_THREADED_ALWAYS() \
do {                    \   if (__isthreaded)       \       DIE();          \
} while (/*CONSTCOND*/0)

#if 1
#define CHECK_NOT_THREADED()    CHECK_NOT_THREADED_ALWAYS()
#else
#define CHECK_NOT_THREADED()    /* nothing */
#endif 

I am going to follow up with the NetBSD team but I pass this on in case
it suggests anything.  Note that this still only causes problems on
7.4.6 and not under 7.4.3.

-- 
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: Trouble with plpgsql on 7.4.6

From
Patrick Welche
Date:
On Tue, Nov 23, 2004 at 07:25:17AM -0500, D'Arcy J.M. Cain wrote:
> The stderr was in the previous message.  No gripes there either other
> than in the startup after the failure.
> 
> > Also see about getting a stack trace from one of the core dumps.
> 
> I did look at the core file and here is what I saw:
> 
> #0  0x483cafeb in kill () from /usr/lib/libc.so.12
> #1  0x483cd0af in __libc_mutex_catchall_stub (m=1212478892)
>     at /usr/src/lib/libc/thread-stub/thread-stub.c:112
> #2  0x4843f0f7 in free (ptr=<incomplete type>)
>     at /usr/src/lib/libc/stdlib/malloc.c:1149
> #3  0x081b3efc in AllocSetDelete (context=<error type>) at aset.c:464
> #4  0x081b468a in MemoryContextDelete (context=<error type>) at
> #mcxt.c:192

Would setting the following environment variable get you an earlier
abort / more logging? (pthread(3))
    PTHREAD_DIAGASSERT       Possible values are any combinations of:                                   A
Ignoreerrors.                                   a         Abort on errors, creating a core
              dump for further debugging.                                   E         Do not log errors to stdout.
                            e         Log errors to stdout.                                   L         Do not log
errorsvia                                             syslogd(8).                                   l         Log
errorsvia syslogd(8).
 

Cheers,

Patrick


Re: Trouble with plpgsql on 7.4.6

From
"D'Arcy J.M. Cain"
Date:
On Tue, 23 Nov 2004 07:25:17 -0500
"D'Arcy J.M. Cain" <darcy@druid.net> wrote:
> 
> #0  0x483cafeb in kill () from /usr/lib/libc.so.12
> #1  0x483cd0af in __libc_mutex_catchall_stub (m=1212478892)
>     at /usr/src/lib/libc/thread-stub/thread-stub.c:112
> #2  0x4843f0f7 in free (ptr=<incomplete type>)
>     at /usr/src/lib/libc/stdlib/malloc.c:1149
> #3  0x081b3efc in AllocSetDelete (context=<error type>) at aset.c:464
> #4  0x081b468a in MemoryContextDelete (context=<error type>) at
> #mcxt.c:192
> 
> In fact it is calling the thread stubs but the applicatino is
> threaded. At least, that's what I see from this code in NetBSD.
> 
> #define CHECK_NOT_THREADED_ALWAYS() \
> do {                    \
>     if (__isthreaded)       \
>         DIE();          \
> } while (/*CONSTCOND*/0)
> 
> #if 1
> #define CHECK_NOT_THREADED()    CHECK_NOT_THREADED_ALWAYS()
> #else
> #define CHECK_NOT_THREADED()    /* nothing */
> #endif 
> 
> I am going to follow up with the NetBSD team but I pass this on in
> case it suggests anything.  Note that this still only causes problems
> on 7.4.6 and not under 7.4.3.

OK, I have found the problem.  It turns out to be in the way that it is
built under the NetBSD pkgsrc system.  As soon as a module is loaded
that uses pthreads it sets a variable (__isthreaded) that causes the
above error.  The answer is to simply link PostgreSQL with -lpthread to
begin with.  We are making some changes to the pkgsrc system to do this
correctly with this and some other packages that could potentially have
the same problem.

Thanks for the help in finding this.

-- 
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.