Thread: Updating multiple bool values crashes backend

Updating multiple bool values crashes backend

From
pgsql-bugs@postgresql.org
Date:
Sean Kelly (S.Kelly@ncl.ac.uk) reports a bug with a severity of 1
The lower the number the more severe it is.

Short Description
Updating multiple bool values crashes backend

Long Description
Bug best described by an example, see below.

Tested on two systems:
Intel Pentium III 600
128Mb RAM
Linux 2.2.17

AMD K6 350
96Mb RAM
Linux 2.2.16

both PostgreSQL 7.0.2

Sample Code
users=> select username,added from users_tbl where username like 'neta%';
 username | added
----------+-------
 neta1    | f
 neta2    | f
 neta3    | f
 neta4    | f
(4 rows)

users=> update users_tbl set added=TRUE where username like 'neta%';
pqReadData() -- backend closed the channel unexpectedly.
        This probably means the backend terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>\q

bash$ tail ~postgres/server.log
Server process (pid 23747) exited with status 11 at Tue Oct 24 13:52:29 2000
Terminating any active server processes...
Server processes were terminated at Tue Oct 24 13:52:29 2000
Reinitializing shared memory and semaphores
The Data Base System is starting up
DEBUG:  Data Base System is starting up at Tue Oct 24 13:52:29 2000
DEBUG:  Data Base System was interrupted being in production at Tue Oct 24 13:51:22 2000
DEBUG:  Data Base System is in production state at Tue Oct 24 13:52:29 2000



No file was uploaded with this report

Re: Updating multiple bool values crashes backend

From
Thomas Lockhart
Date:
> Short Description
> Updating multiple bool values crashes backend

I cannot reproduce this example with 7.0.2 on my Linux-2.2.16 laptop. We
will need more details and a reproducible example to help out...

                       - Thomas

Re: Updating multiple bool values crashes backend

From
Sean Kelly
Date:
Here is a debug (level 2) output - does this help any
more?...  If not, what should I provide for you in terms of
debugging?...


FindExec: found "/usr/local/postgres/bin/postgres" using argv[0]
binding ShmemCreate(key=52e2c1, size=1104896)
DEBUG:    Data Base System is starting up at Wed Oct 25 13:15:17 2000
DEBUG:    Data Base System was shut down at Wed Oct 25 13:14:49 2000
DEBUG:    Data Base System is in production state at Wed Oct 25
13:15:17 2000
proc_exit(0)
shmem_exit(0)
exit(0)
/usr/local/postgres/bin/postmaster: reaping dead processes...
/usr/local/postgres/bin/postmaster: ServerLoop:     handling
reading 5
/usr/local/postgres/bin/postmaster: ServerLoop:     handling
reading 5
/usr/local/postgres/bin/postmaster: ServerLoop:     handling
writing 5
/usr/local/postgres/bin/postmaster: BackendStartup: pid 28428 user
sean db users socket 5
/usr/local/postgres/bin/postmaster child[28428]: starting with
(/usr/local/postgres/bin/postgres -d2 -v131072 -p users )
FindExec: found "/usr/local/postgres/bin/postgres" using argv[0]
started: host=localhost user=sean database=users
InitPostgres
StartTransactionCommand
query: SELECT usesuper FROM pg_user WHERE usename = 'sean'
ProcessQuery
CommitTransactionCommand
StartTransactionCommand
query: delete from dialup_tbl where username='sib';
ProcessQuery
CommitTransactionCommand
StartTransactionCommand
query: delete from users_tbl where username='sib';
ProcessQuery
query: SELECT oid FROM "dialup_tbl" WHERE "username" = $1 FOR UPDATE
OF "dialup_tbl"
/usr/local/postgres/bin/postmaster: reaping dead processes...
/usr/local/postgres/bin/postmaster: CleanupProc: pid 28428 exited
with status 11
Server process (pid 28428) exited with status 11 at Wed Oct 25
13:15:35 2000
Terminating any active server processes...
Server processes were terminated at Wed Oct 25 13:15:35 2000
Reinitializing shared memory and semaphores
shmem_exit(0)
binding ShmemCreate(key=52e325, size=1104896)
/usr/local/postgres/bin/postmaster: ServerLoop:     handling
reading 5
/usr/local/postgres/bin/postmaster: ServerLoop:     handling
reading 5
DEBUG:    Data Base System is starting up at Wed Oct 25 13:15:35 2000
DEBUG:    Data Base System was interrupted being in production at Wed
Oct 25 13:15:17 2000
/usr/local/postgres/bin/postmaster: ServerLoop:     handling
writing 5
The Data Base System is starting up
/usr/local/postgres/bin/postmaster: ServerLoop:     handling
writing 5
DEBUG:    Data Base System is in production state at Wed Oct 25
13:15:35 2000
proc_exit(0)
shmem_exit(0)
exit(0)
/usr/local/postgres/bin/postmaster: reaping dead processes...

    Thanks,

--
Sean Kelly <S.Kelly@ncl.ac.uk>
"If 99% is good enough, then gravity will not work for 14 minutes
 every day."

Re: Updating multiple bool values crashes backend

From
Thomas Lockhart
Date:
>         Here is a debug (level 2) output - does this help any
> more?...  If not, what should I provide for you in terms of
> debugging?...

Hmm. I didn't find the update/select combination you specified in your
problem statement in your debugging output, but it wouldn't have likely
helped anyway.

Your initial problem statement was of the form "if you do this, then if
you do that, you will get a server crash". That problem statement is
simple and testable, *if* it were accompanied by a reproducible example.

We are not likely to be able to help track down a problem if we can not
reproduce it. So I would suggest the following:

1) dump your database using pg_dump or pg_dumpall.

2) reload your database using the dump from (1).

3) show that the problem is reproducible for you.

4) distill the scenerio down to the fundamental elements, if possible.

5) file a problem report, and be ready to send an example including
schema and data.

The problem for us is that your problem statement is not reproducible
here. So you will need to show how *you* can reproduce it using fresh
data for us to be able to help. There are other causes of database
failure (e.g. bad server memory) which we have no control over and which
are less likely to be a problem if you can get a reproducible case.

Just remember that "reproducible" doesn't necessarily mean that you can
get it to happen more than once on the same database. Ideally it means
that you can create a new database and demonstrate the same problem.

Hope this helps.

                      - Thomas

Re: Updating multiple bool values crashes backend

From
Tom Lane
Date:
pgsql-bugs@postgreSQL.org writes:
> users=> update users_tbl set added=TRUE where username like 'neta%';
> pqReadData() -- backend closed the channel unexpectedly.

> bash$ tail ~postgres/server.log
> Server process (pid 23747) exited with status 11 at Tue Oct 24 13:52:29 2000

This backend crash should have left a core file in your database
directory (PGDATA/base/users/core).  Can you provide a backtrace
from that corefile using gdb?

            regards, tom lane

Re: Updating multiple bool values crashes backend

From
Sean Kelly
Date:
On Wed, 25 Oct 2000 12:52:44 -0400, Tom Lane said:

> pgsql-bugs@postgreSQL.org writes:
>  > users=> update users_tbl set added=TRUE where username like 'neta%';
>  > pqReadData() -- backend closed the channel unexpectedly.
>
>  > bash$ tail ~postgres/server.log
>  > Server process (pid 23747) exited with status 11 at Tue Oct 24 13:52:29 2000
>
>  This backend crash should have left a core file in your database
>  directory (PGDATA/base/users/core).  Can you provide a backtrace
>  from that corefile using gdb?

    No core there ... any other suggestions?  With respect to GCC
errors, '11' normally indicates a hardware problem - could this be
the case?  One of the machines I tested it on was brand new
hardware...

    Thanks,

--
Sean Kelly <S.Kelly@ncl.ac.uk>
"If 99% is good enough, then gravity will not work for 14 minutes
 every day."

Re: Updating multiple bool values crashes backend

From
Tom Lane
Date:
Sean Kelly <S.Kelly@ncl.ac.uk> writes:
>> This backend crash should have left a core file in your database
>> directory (PGDATA/base/users/core).  Can you provide a backtrace
>> from that corefile using gdb?

>     No core there ... any other suggestions?

You probably started the postmaster with a ulimit setting that prevents
coredumps (ulimit -c 0 or something like that, see your ulimit man page).
On some Unixen, this ulimit setting is the default for anything started
from a system boot script.  Restart the postmaster with ulimit -c
unlimited, either by starting it by hand or adding a ulimit call to the
boot script.  Then reproduce the crash to get a core file.

> With respect to GCC errors, '11' normally indicates a hardware
> problem

Uh, whoever told you that?  Signal 11 is SIGSEGV on most Unixen,
and that just means the program tried to dereference an invalid
pointer.  Almost certainly, we're looking at some software bug
here, not a hardware failure.

            regards, tom lane

Re: Updating multiple bool values crashes backend

From
Sean Kelly
Date:
On Wed, 25 Oct 2000 14:14:22 -0400, Tom Lane said:

>  >     No core there ... any other suggestions?
>
>  You probably started the postmaster with a ulimit setting that prevents
>  coredumps (ulimit -c 0 or something like that, see your ulimit man page).
>  On some Unixen, this ulimit setting is the default for anything started
>  from a system boot script.  Restart the postmaster with ulimit -c
>  unlimited, either by starting it by hand or adding a ulimit call to the
>  boot script.  Then reproduce the crash to get a core file.

    Ok, I sorted that ... I now have a 2Mb core file.  Can you
explain how to 'backtrace' it with gdb ... I'm not really a developer
and haven't played with gdb much ... ever ...  I've stuck the core
file at http://www.randomfx.net/core.html if you need it.

    As someone suggested, I 'pg_dump'ed the database, 'dropdb'ed
and 'createdb'ed it, before reloading.    After reloading the results
were the same.    I tried this on both the machines running 7.0.2 with
the same results.

>  > With respect to GCC errors, '11' normally indicates a hardware
>  > problem
>
>  Uh, whoever told you that?  Signal 11 is SIGSEGV on most Unixen,
>  and that just means the program tried to dereference an invalid
>  pointer.  Almost certainly, we're looking at some software bug
>  here, not a hardware failure.

    One example can be found on http://www.bitwizard.nl/sig11/

    Thanks for your time and help,

--
Sean Kelly <S.Kelly@ncl.ac.uk>
"If 99% is good enough, then gravity will not work for 14 minutes
 every day."

Re: Updating multiple bool values crashes backend

From
Tom Lane
Date:
Sean Kelly <S.Kelly@ncl.ac.uk> writes:
>     Ok, I sorted that ... I now have a 2Mb core file.  Can you
> explain how to 'backtrace' it with gdb ...

    gdb /path/to/postgres-executable /path/to/core-file
    bt
    quit

and send the results.  Hopefully there will be at least function names
in the display --- if it's all numbers then don't bother sending it :-(

> I've stuck the core
> file at http://www.randomfx.net/core.html if you need it.

Thanks, but it's pretty much useless to anyone who doesn't have the
exact same executable and same system platform as you.

>>>>> With respect to GCC errors, '11' normally indicates a hardware
>>>>> problem
>>
>> Uh, whoever told you that?

>     One example can be found on http://www.bitwizard.nl/sig11/

Hmph.  bitwizard may think that flaky hardware is a normal state of
affairs, but I don't.  Perhaps he buys his machines from incompetent
manufacturers.

            regards, tom lane

Re: Updating multiple bool values crashes backend

From
Tom Lane
Date:
Sean Kelly <S.Kelly@ncl.ac.uk> writes:
> (gdb) bt
> #0  0x8115eb2 in ri_BuildQueryKeyFull ()
> #1  0x8115dc2 in RI_FKey_keyequal_upd ()
> #2  0x8096d7c in DeferredTriggerSaveEvent ()

Hmm.  There wasn't any mention of foreign keys for this table in your
bug report, now was there?

At a guess, you've run into the known bug that foreign key triggers
don't track renames of referenced tables.  Did you rename a table that
is a foreign-key referencer or referencee of this one?  If so, rename
it back, or drop and reload both tables.  (The crash is fixed for
7.0.3, though actually tracking the renames is further downstream.)

            regards, tom lane

Re: Updating multiple bool values crashes backend

From
Sean Kelly
Date:
On Thu, 26 Oct 2000 10:14:16 -0400, Tom Lane said:

>      gdb /path/to/postgres-executable /path/to/core-file
>      bt
>      quit

[postgres@nis-master] ~
   132: gdb bin/postgres data/base/users/core

This GDB was configured as "i386-slackware-linux"...
Core was generated by `/usr/local/postgres/bin/postgres localhost s'.
Program terminated with signal 11, Segmentation fault.
..
[SNIP: Loading symbols...]
..
#0  0x8115eb2 in ri_BuildQueryKeyFull ()
(gdb) bt
#0  0x8115eb2 in ri_BuildQueryKeyFull ()
#1  0x8115dc2 in RI_FKey_keyequal_upd ()
#2  0x8096d7c in DeferredTriggerSaveEvent ()
#3  0x8096016 in ExecARUpdateTriggers ()
#4  0x809c617 in ExecReplace ()
#5  0x809c256 in ExecutePlan ()
#6  0x809b8f3 in ExecutorRun ()
#7  0x80eb46a in ProcessQueryDesc ()
#8  0x80eb4d0 in ProcessQuery ()
#9  0x80ea153 in pg_exec_query_dest ()
#10 0x80ea033 in pg_exec_query ()
#11 0x80eaeec in PostgresMain ()
#12 0x80d565a in DoBackend ()
#13 0x80d523a in BackendStartup ()
#14 0x80d45ee in ServerLoop ()
#15 0x80d407c in PostmasterMain ()
#16 0x80ab115 in main ()
#17 0x400f9577 in __libc_start_main () from /lib/libc.so.6
(gdb) quit

    There we go :)    Thanks,

--
Sean Kelly <S.Kelly@ncl.ac.uk>
"If 99% is good enough, then gravity will not work for 14 minutes
 every day."

Re: Updating multiple bool values crashes backend

From
Sean Kelly
Date:
On Thu, 26 Oct 2000 11:27:22 -0400, Tom Lane said:

> Sean Kelly <S.Kelly@ncl.ac.uk> writes:
>  > (gdb) bt
>  > #0  0x8115eb2 in ri_BuildQueryKeyFull ()
>  > #1  0x8115dc2 in RI_FKey_keyequal_upd ()
>  > #2  0x8096d7c in DeferredTriggerSaveEvent ()
>
>  Hmm.  There wasn't any mention of foreign keys for this table in your
>  bug report, now was there?
>
>  At a guess, you've run into the known bug that foreign key triggers
>  don't track renames of referenced tables.  Did you rename a table that
>  is a foreign-key referencer or referencee of this one?  If so, rename
>  it back, or drop and reload both tables.  (The crash is fixed for
>  7.0.3, though actually tracking the renames is further downstream.)

    Ah ha!....  oldname_tbl referenced the primary key in
users_tbl, and oldname_tbl was renamed newname_tbl.  Is this the bug
you mean?...  When you say drop/reload both tables do you mean both
users_tbl and newname_tbl?...

    Thanks, I think it's nearly sorted now :)

--
Sean Kelly <S.Kelly@ncl.ac.uk>
"If 99% is good enough, then gravity will not work for 14 minutes
 every day."

Re: Updating multiple bool values crashes backend

From
Bruce Momjian
Date:
Do we need a TODO here?

> Sean Kelly <S.Kelly@ncl.ac.uk> writes:
> > (gdb) bt
> > #0  0x8115eb2 in ri_BuildQueryKeyFull ()
> > #1  0x8115dc2 in RI_FKey_keyequal_upd ()
> > #2  0x8096d7c in DeferredTriggerSaveEvent ()
>
> Hmm.  There wasn't any mention of foreign keys for this table in your
> bug report, now was there?
>
> At a guess, you've run into the known bug that foreign key triggers
> don't track renames of referenced tables.  Did you rename a table that
> is a foreign-key referencer or referencee of this one?  If so, rename
> it back, or drop and reload both tables.  (The crash is fixed for
> 7.0.3, though actually tracking the renames is further downstream.)
>
>             regards, tom lane
>


--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: Updating multiple bool values crashes backend

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Do we need a TODO here?

None that we haven't got already, AFAIK.

            regards, tom lane

Re: Updating multiple bool values crashes backend

From
Darcy Buskermolen
Date:
Could it also be the result of a Cluster operation? I've seen strange
things related to functions/triggers on tables that I've clustered.


>> At a guess, you've run into the known bug that foreign key triggers
>> don't track renames of referenced tables.  Did you rename a table that
>> is a foreign-key referencer or referencee of this one?  If so, rename
>> it back, or drop and reload both tables.  (The crash is fixed for
>> 7.0.3, though actually tracking the renames is further downstream.)
>>
>>             regards, tom lane
>>
>
>
>--
>  Bruce Momjian                        |  http://candle.pha.pa.us
>  pgman@candle.pha.pa.us               |  (610) 853-3000
>  +  If your life is a hard drive,     |  830 Blythe Avenue
>  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
>
>

Re: Updating multiple bool values crashes backend

From
Sean Kelly
Date:
On Fri, 27 Oct 2000 11:55:24 -0700, Darcy Buskermolen said:

> Could it also be the result of a Cluster operation? I've seen strange
>  things related to functions/triggers on tables that I've clustered.

    Personally for me it turned out to be, as Tom said, the
renaming of a table involving foreign keys.  I renamed the table back
to what it was, dropped it, and then recreated the new one with new
foreign keys.

    Thanks,

--
Sean Kelly <S.Kelly@ncl.ac.uk>
"If 99% is good enough, then gravity will not work for 14 minutes
 every day."