Re: 7.0.3 database corruption - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: 7.0.3 database corruption
Date
Msg-id 3B28CB88.57B9442@tm.ee
Whole thread Raw
In response to 7.0.3 database corruption  (mlw <markw@mohawksoft.com>)
Responses Re: 7.0.3 database corruption  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> 
> > mlw wrote:
> >> After we run the
> >> scripts, it looks like the database is corrupt.
> 
> It's impossible to say anything useful with such an undescriptive
> description of the problem.
> 
> Hannu Krosing <hannu@tm.ee> writes:
> > There certainly are bugs in 7.0.3 - I can describe at least two:
> 
> I would really like to see a reproducible example of index corruption
> in 7.0.*.  We've heard such reports often enough to know the problem
> is real, but without a test case in hand it's difficult to do much about
> it.

I know ;( Unfortunately this has happened only a few times on some quite 
busy servers receiving a workload of quite varied queries.

> > 2. Some kind of stuck locks - a single backend stuck in "INSERT waiting"
> 
> 7.0.*'s deadlock detection algorithm is known to have some holes, but
> deadlock couldn't be the explanation for just a single stuck backend.

that's what "ps ax| grep post" output looks like in my logs

Sun Jun 10 06:31:00 EET 2001 828 ?        S      0:02 /usr/bin/postmaster -i -o -F
26652 ?        S      5:20 /usr/bin/postgres localhost gamer casino idle
30082 ?        S      0:20 /usr/bin/postgres 127.0.0.1 nobody casino
idle
30084 ?        S      1:26 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31565 ?        S      0:43 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31595 ?        S      0:19 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31596 ?        S      0:21 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31597 ?        S      0:31 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31598 ?        S      1:39 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31600 ?        S      0:17 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31608 ?        S      0:24 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31612 ?        S      0:24 /usr/bin/postgres 127.0.0.1 nobody casino
idle
32080 ?        S      0:43 /usr/bin/postgres localhost gamer casino
UPDATE waiti
32706 ?        S      0:10 /usr/bin/postgres localhost gamer casino idle 302 ?        S      0:00 /usr/bin/postgres
127.0.0.1nobody casino
 
idle 361 ?        S      0:00 sh -c date;ps ax|grep post 364 ?        S      0:00 grep post

CHECKING WAITING PIDS: ['32080']
Sun Jun 10 06:31:10 EET 2001 828 ?        S      0:02 /usr/bin/postmaster -i -o -F
26652 ?        S      5:20 /usr/bin/postgres localhost gamer casino idle
30082 ?        S      0:20 /usr/bin/postgres 127.0.0.1 nobody casino
idle
30084 ?        S      1:26 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31565 ?        S      0:43 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31595 ?        S      0:19 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31596 ?        S      0:21 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31597 ?        S      0:31 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31598 ?        S      1:39 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31600 ?        S      0:17 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31608 ?        S      0:24 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31612 ?        S      0:24 /usr/bin/postgres 127.0.0.1 nobody casino
idle
32080 ?        S      0:43 /usr/bin/postgres localhost gamer casino
UPDATE waiti
32706 ?        S      0:10 /usr/bin/postgres localhost gamer casino idle 302 ?        S      0:00 /usr/bin/postgres
127.0.0.1nobody casino
 
idle 365 ?        S      0:00 sh -c date;ps ax|grep post 368 ?        S      0:00 grep post

PROCESS 32080 STILL WAITING, RESTART TIME


> Again, any chance of looking at an example?

I could send you tails of postgres logfiles that are rotated on
detecting 
the INSERT/UPDATE wait condition that does not go away in 10 sec. 
How long logfiles (time) would be enough ?

There seems to be no general pattern that leads to it though ;(

---------------
Hannu


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: 7.0.3 database corruption
Next
From: Tom Lane
Date:
Subject: Re: 7.0.3 database corruption