Re: [HACKERS] Shared memory corruption? - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [HACKERS] Shared memory corruption?
Date
Msg-id 199802122010.PAA03196@candle.pha.pa.us
Whole thread Raw
In response to Shared memory corruption?  (Tom I Helbekkmo <tih@Hamartun.Priv.NO>)
List pgsql-hackers
Vadim, I may need your help on this one.  I can reproduce it by runinng
the regression test, and doing a shell 'while' loop that continuously
creates databases:

    while :
    do
        sh -c 'createdb $$'
    done

I get the errors too.  I have no idea on a cause.  I would hope it is
not the new deadlock code, or locking fixes I did.  I think the message
comes from smgrblindwrt.  Is it possible our new speedups are causing
it?



>
> [similar report submitted previously, but this is more complete]
>
> There is something that looks like shared memory corruption going on,
> which I first noticed by accident the other day, in the 1998-02-09
> snapshot.  It's still there today, with the 1998-02-12 one, and looks
> like the following on my Sun SS2 under NetBSD/sparc 1.3 (I've created
> a simple test case here, for easy testing elsewhere):
>
> First, I run initdb, start a postmaster, create a user 'tih', stop the
> postmaster, restart the postmaster with '-d', thus:
>
>  barsoom:postgres> postmaster -i -d
>  FindBackend: searching PATH ...
>  FindBackend: found "/usr/local/pgsql/bin/postgres" using PATH
>
> Next, I create a database 'words', thus:
>
>  barsoom:tih> createdb words
>  barsoom:tih>
>
> The postmaster says:
>
>  postmaster: BackendStartup: pid 6542 user tih db template1 socket 5
>  postmaster: reaping dead processes...
>  postmaster: CleanupProc: pid 6542 exited with status 0
>
> I fire up psql, thus:
>
>  barsoom:tih> psql words
>  words=>
>
> The postmaster goes:
>
>  postmaster: BackendStartup: pid 6549 user tih db words socket 5
>
> In psql, I then do the following:
>
>  words=> create table dictionary (entry char(64));
>  CREATE
>  words=> create unique index dict_by_entry on dictionary (entry);
>  CREATE
>  words=> copy dictionary from '/usr/share/dict/words';
>
> The postmaster generates no output at this, and the copy starts as it
> should.  There is much disk activity.  Next, while this is running,in
> another terminal window, as the same user 'tih', I do:
>
>  barsoom:tih> createdb
>  Connection to database 'template1' failed.
>  PQexec() -- There is no connection to the backend.
>  createdb: database creation failed on tih.
>  barsoom:tih>
>
> When this happens, the postmaster generates the following output:
>
>  postmaster: BackendStartup: pid 6560 user tih db template1 socket 5
>  ERROR:  cannot write block 171 of dict_by_entry [words] blind
>  postmaster: reaping dead processes...
>  postmaster: CleanupProc: pid 6560 exited with status 0
>
> Looking at processes running on the system at this time, I see:
>
>   6549 p6  R+ 2:01.88 /usr/local/pgsql/bin/postgres -p -Q -P5 -v 65536 words
>
> This is the backend doing the copy.  It is spinning furiously, eating
> CPU like there was no tomorrow -- but there is no more disk activity.
> The terminal window where I initiated the copy operation looks as
> though it were proceeding normally.  So now I attempt to perform the
> database creation again, thus (in the second terminal):
>
>  barsoom:tih> createdb
>
> Nothing happens -- it just hangs there.  The postmaster says:
>
>  postmaster: BackendStartup: pid 6595 user tih db template1 socket 5
>
> Looking with ps again, I can see that this backend is now also running
> wild, sharing the CPU half and half with the one with PID 6549...
>
> Note that I'm trying to create a different database when it breaks;
> the only possible interaction is through the shared memory that I
> understand is maintained by the postmaster on behalf of the backends.
> As for seeing this on other platforms, I certainly hope it's
> repeatable elsewhere, but it's not unreasonable to assume that it
> could cause different symptoms on other platforms, including quiet
> data corruption...
>
> The whole thing is completely repeatable here -- any ideas can be
> verified quickly and easily -- and with enthusiasm.  :-)
>
> -tih
> --
> Popularity is the hallmark of mediocrity.  --Niles Crane, "Frasier"
>
>
>


--
Bruce Momjian
maillist@candle.pha.pa.us

pgsql-hackers by date:

Previous
From: Goran Thyni
Date:
Subject: shmem/mmap Q
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] Shared memory corruption?