7.0RC1: possible query and backend problem - Mailing list pgsql-general

From Michael Blakeley
Subject 7.0RC1: possible query and backend problem
Date
Msg-id p04310103b523f49d6641@blakeley.com
Whole thread Raw
Responses Re: 7.0RC1: possible query and backend problem
Re: 7.0RC1: possible query and backend problem
List pgsql-general
I've just upgraded to 7.0RC1 on a Solaris 2.6+patches system (U5 with
256MB). I'm having some trouble with a brand-new query, so I don't
know if this is a new bug or an old one. It has two parts: query
execution and backend robustness.

First, I'm getting a crash with this query (via Apache mod_perl and
DBI, or via psql):

Table U: id varchar(128) primary key, stop datetime, start datetime
Table S1: id varchar(128) primary key, url varchar(512);
Table S2: id varchar(128) primary key, url varchar(512);

    SELECT id,date_part('epoch',sum(stop-start)),count(*),S1.url,S2.url
    FROM U WHERE 'now'::datetime-start<'1 month'::interval
    AND (id=S1.id OR id=S2.id) GROUP BY id,S1.url,S2.url;

The idea is to select whichever of S1.url and S2.url is non-null.

 From psql, I get:

    pqReadData() -- backend closed the channel unexpectedly.
         This probably means the backend terminated abnormally
         before or while processing the request.
    connection to server was lost

And now there are no postgres processes running.

I suspect that the problem is that, in my data set, either S1.url or
S2.url will be null for any given row, and this is causing problems
for GROUP BY (I can also believe that my SQL is bad - but that
shouldn't crash postgres, should it?). If I stick to just S1, the
query runs ok.

But... now there are no postgres processes running at all.

# ps -fu postgres
      UID   PID  PPID  C    STIME TTY      TIME CMD

In the postgres log:

>Server process (pid 11189) exited with status 139 at Wed Apr 19 16:37:25 2000
>Terminating any active server processes...
>NOTICE:  Message from PostgreSQL backend:
>         The Postmaster has informed me that some other backend died
>abnormally and possibly corrupted shared memory.
>         I have rolled back the current transaction and am going to
>terminate your database system connection and exit.
>         Please reconnect to the database system and repeat your query.
>Server processes were terminated at Wed Apr 19 16:37:25 2000
>Reinitializing shared memory and semaphores
>IpcMemoryCreate: shmget failed (Invalid argument) key=5432110,
>size=144, permission=700
>This type of error is usually caused by an improper
>shared memory or System V IPC semaphore configuration.
>For more information, see the FAQ and platform-specific
>FAQ's in the source directory pgsql/doc or on our
>web site at http://www.postgresql.org.
>IpcMemoryIdGet: shmget failed (Invalid argument) key=5432110,
>size=144, permission=0
>IpcMemoryAttach: shmat failed (Invalid argument) id=-2
>FATAL 1:  AttachSLockMemory: could not attach segment

Now, the real problem here is that postgres seems to be trying to
restart, and fails. It complains about shmget - my /etc/system says:
set shmsys:shminfo_shmmax =  0x40000000
set shmsys:shminfo_shmmni = 100
set shmsys:shminfo_shmseg = 32
set semsys:seminfo_semmns = 1024
set semsys:seminfo_semmni = 1024
set semsys:seminfo_semmnu = 1024
set semsys:seminfo_semmsl = 1024
set semsys:seminfo_semmap = 1024

So I don't think this is really a config problem (if it were, the db
wouldn't start up to begin with). Rather, I'd guess that postgres
isn't freeing a memory block during the "Terminating any active
server processes..." phase.

Running ipcs -a immediately after the crash shows several segments
(although oddly, I have no trouble starting the db again as root -
maybe there's a permissions issue at work here?):

# ipcs -a
IPC status from <running system> as of Wed Apr 19 16:45:42 2000
T         ID      KEY        MODE        OWNER    GROUP  CREATOR
CGROUP NATTCH      SEGSZ  CPID  LPID   ATIME    DTIME    CTIME
Shared Memory:
m        800   0x0052e32e --rw------- postgres postgres postgres
postgres      0        120 12737 12737 13:01:36 13:01:36 13:01:36
m        801   0x0052e325 --rw------- postgres postgres postgres
postgres      0   15373312 12737 12737 13:01:36 13:01:36 13:01:36
m        802   0x0052e32b --rw------- postgres postgres postgres
postgres      0     102184 12737 12737 13:01:36 13:01:36 13:01:36
m          3   0x500005a3 --rw-r--r--     root     root     root
root      1         68   232   232 15:29:35 15:29:35 15:29:35

Anyway, I hope this report is helpful. Thanks to the developers for
their hard work on PG7.

-- Mike

pgsql-general by date:

Previous
From: Tom Cook
Date:
Subject: Re: Sorry, I forgot put the exception in my message...
Next
From: Lincoln Yeoh
Date:
Subject: Re: Connecting website with SQL-database.....