Thread: missing pg_clog files ?

missing pg_clog files ?

From
Patrick Welche
Date:
There was a thread on missing pg_clog files caused due to dodgy practices in
glibc *last year*. I am seeing something similar *now* with a server
PostgreSQL 7.4beta1 on i386-unknown-netbsdelf1.6X, compiled by GCC 2.95.3

accessed by a similar client and a client
PostgreSQL 7.4devel on i686-pc-linux-gnu, compiled by GCC 2.95.4


The following works:
 select * from olddata02_03vac offset 2573718 limit 1;

however
 select * from olddata02_03vac offset 2573719 limit 1;
ERROR:  could not access status of transaction 1664158221
DETAIL:  open of file "/usr/local/pgsql/data/pg_clog/0633" failed: No such file or directory

and

# ls -l pg_clog
total 32
-rw-------  1 postgres  postgres  16384 Sep 22 13:12 0000
#


Is it true that the problem was with the server, so the fact that a glibc
client was connecting and possibly doing a vacuum is irrelevant?

What can I do now to fix it? Known problem with beta1 which is now old?

Cheers,

Patrick


Re: missing pg_clog files ?

From
Tom Lane
Date:
Patrick Welche <prlw1@newn.cam.ac.uk> writes:
>   select * from olddata02_03vac offset 2573719 limit 1;
> ERROR:  could not access status of transaction 1664158221
> DETAIL:  open of file "/usr/local/pgsql/data/pg_clog/0633" failed: No such file or directory

> # ls -l pg_clog
> total 32
> -rw-------  1 postgres  postgres  16384 Sep 22 13:12 0000

What you have here is a corrupted tuple (viz, a silly transaction number).

It would be useful to look at the page containing the tuple to see if
any pattern can be detected in the corruption.  To do this, get the
ctid of the prior tuple:select ctid from olddata02_03vac offset 2573718 limit 1;
This will give you a result "(blocknumber,tuplenumber)".  The bogus
tuple is probably on the same page, though possibly further along.
Next find a dump tool --- I usually use Red Hat's pg_filedump:http://sources.redhat.com/rhdb/tools.html
Dump out the page(s) in question and send them along.
        regards, tom lane


Re: missing pg_clog files ?

From
Patrick Welche
Date:
On Mon, Sep 22, 2003 at 10:50:22AM -0400, Tom Lane wrote:
> Patrick Welche <prlw1@newn.cam.ac.uk> writes:
> >   select * from olddata02_03vac offset 2573719 limit 1;
> > ERROR:  could not access status of transaction 1664158221
> > DETAIL:  open of file "/usr/local/pgsql/data/pg_clog/0633" failed: No such file or directory
> 
> > # ls -l pg_clog
> > total 32
> > -rw-------  1 postgres  postgres  16384 Sep 22 13:12 0000
> 
> What you have here is a corrupted tuple (viz, a silly transaction number).
> 
> It would be useful to look at the page containing the tuple to see if
> any pattern can be detected in the corruption.  To do this, get the
> ctid of the prior tuple:
>     select ctid from olddata02_03vac offset 2573718 limit 1;
(71716,15)

> This will give you a result "(blocknumber,tuplenumber)".  The bogus
> tuple is probably on the same page, though possibly further along.
> Next find a dump tool --- I usually use Red Hat's pg_filedump:
>     http://sources.redhat.com/rhdb/tools.html
> Dump out the page(s) in question and send them along.

I hope I guessed the right syntax...

% pg_filedump -R 71716 data/base/17148/283342

*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 2.0-Alpha
*
* File: data/base/17148/283342
* Options used: -R 71716 
*
* Dump created on: Mon Sep 22 16:21:29 2003
*******************************************************************

Block 71716 ********************************************************
<Header> -----Block Offset: 0x23048000         Offsets: Lower     176 (0x00b0)Block: Size 8192  Version    1
Upper    236 (0x00ec)LSN:  logid      1 recoff 0xd308022c      Special  8192 (0x2000)Items:   39                   Free
Space:  60Length (including item array): 180
 

<Data> ------ Item   1 -- Length:  204  Offset: 7988 (0x1f34)  Flags: USEDItem   2 -- Length:  204  Offset: 7784
(0x1e68) Flags: USEDItem   3 -- Length:  204  Offset: 7580 (0x1d9c)  Flags: USEDItem   4 -- Length:  204  Offset: 7376
(0x1cd0) Flags: USEDItem   5 -- Length:  204  Offset: 7172 (0x1c04)  Flags: USEDItem   6 -- Length:  204  Offset: 6968
(0x1b38) Flags: USEDItem   7 -- Length:  204  Offset: 6764 (0x1a6c)  Flags: USEDItem   8 -- Length:  204  Offset: 6560
(0x19a0) Flags: USEDItem   9 -- Length:  204  Offset: 6356 (0x18d4)  Flags: USEDItem  10 -- Length:  204  Offset: 6152
(0x1808) Flags: USEDItem  11 -- Length:  204  Offset: 5948 (0x173c)  Flags: USEDItem  12 -- Length:  204  Offset: 5744
(0x1670) Flags: USEDItem  13 -- Length:  204  Offset: 5540 (0x15a4)  Flags: USEDItem  14 -- Length:  204  Offset: 5336
(0x14d8) Flags: USEDItem  15 -- Length:  204  Offset: 5132 (0x140c)  Flags: USEDItem  16 -- Length:  204  Offset: 4928
(0x1340) Flags: USEDItem  17 -- Length:  204  Offset: 4724 (0x1274)  Flags: USEDItem  18 -- Length:  204  Offset: 4520
(0x11a8) Flags: USEDItem  19 -- Length:  204  Offset: 4316 (0x10dc)  Flags: USEDItem  20 -- Length:  204  Offset: 4112
(0x1010) Flags: USEDItem  21 -- Length:  204  Offset: 3908 (0x0f44)  Flags: USEDItem  22 -- Length:  204  Offset: 3704
(0x0e78) Flags: USEDItem  23 -- Length:  204  Offset: 3500 (0x0dac)  Flags: USEDItem  24 -- Length:  204  Offset: 3296
(0x0ce0) Flags: USEDItem  25 -- Length:  204  Offset: 3092 (0x0c14)  Flags: USEDItem  26 -- Length:  204  Offset: 2888
(0x0b48) Flags: USEDItem  27 -- Length:  204  Offset: 2684 (0x0a7c)  Flags: USEDItem  28 -- Length:  204  Offset: 2480
(0x09b0) Flags: USEDItem  29 -- Length:  204  Offset: 2276 (0x08e4)  Flags: USEDItem  30 -- Length:  204  Offset: 2072
(0x0818) Flags: USEDItem  31 -- Length:  204  Offset: 1868 (0x074c)  Flags: USEDItem  32 -- Length:  204  Offset: 1664
(0x0680) Flags: USEDItem  33 -- Length:  204  Offset: 1460 (0x05b4)  Flags: USEDItem  34 -- Length:  204  Offset: 1256
(0x04e8) Flags: USEDItem  35 -- Length:  204  Offset: 1052 (0x041c)  Flags: USEDItem  36 -- Length:  204  Offset:  848
(0x0350) Flags: USEDItem  37 -- Length:  204  Offset:  644 (0x0284)  Flags: USEDItem  38 -- Length:  204  Offset:  440
(0x01b8) Flags: USEDItem  39 -- Length:  204  Offset:  236 (0x00ec)  Flags: USED
 


*** End of Requested Range Encountered. Last Block Read: 71716 ***


Cheers,

Patrick


Re: missing pg_clog files ?

From
Tom Lane
Date:
Patrick Welche <prlw1@newn.cam.ac.uk> writes:
> I hope I guessed the right syntax...
> % pg_filedump -R 71716 data/base/17148/283342

Yes, but this doesn't give all the available info.  Add -i and -f
options.  A plain -d dump might be interesting too.
        regards, tom lane


Re: missing pg_clog files ?

From
Patrick Welche
Date:
On Mon, Sep 22, 2003 at 11:33:30AM -0400, Tom Lane wrote:
> Patrick Welche <prlw1@newn.cam.ac.uk> writes:
> > I hope I guessed the right syntax...
> > % pg_filedump -R 71716 data/base/17148/283342
> 
> Yes, but this doesn't give all the available info.  Add -i and -f
> options.  A plain -d dump might be interesting too.

Indeed, the plain -d dump says that I have a chunk of /var/mail/prlw1
in 1000-13ff. No wonder postgres complained!

Highlight:
 0fe0: 06000000 00000000 00000000 00000000  ................ 0ff0: 01000000 3e000000 00000000 00000000
....>...........1000: 52657475 726e2d70 6174683a 203c7072  Return-path: <pr 1010: 6c773140 6e65776e 2e63616d 2e61632e
lw1@newn.cam.ac.
... 13e0: 38323020 20202020 37313139 38202020  820     71198    13f0: 20323425 20202020 32303637 20202032   24%    2067
 2 1400: 3e000000 00000000 03000000 b6090000  >...........¶... 1410: 00000000 00000000 01002418 0f001a00
..........$.....

Would you be interested in the full dump anyway? It seems this is trashed
and I need to bring out the backups, right? Next is speculation as to how?
I read a very large mail file with mutt which I think uses mmap. It still
begs the question how did that end up in the database.. Worth reloading into
same database server, or upgrade to current cvs?
NetBSD-1.6ZC/i386 with 2Gb memory.

Thanks for the help!

Cheers,

Patrick


Re: missing pg_clog files ?

From
Tom Lane
Date:
Patrick Welche <prlw1@newn.cam.ac.uk> writes:
> Indeed, the plain -d dump says that I have a chunk of /var/mail/prlw1
> in 1000-13ff. No wonder postgres complained!

Yipes.  We have seen this sort of thing once or twice in the past.
I don't know whether you are looking at a disk drive fault (dropping
the right data onto the wrong sector) or a kernel fault (getting
confused about which buffer holds which file block), but I believe
it's one or the other.  It's hard to see how anything at the application
level could have gotten those two files switched.  You might look to see
if there are any updates available for your kernel.

As for recovery, you probably want to drop and reload at least that one
table.  Whether it's worth a complete reload is your call.
        regards, tom lane


Re: missing pg_clog files ?

From
Alvaro Herrera
Date:
On Mon, Sep 22, 2003 at 05:03:28PM +0100, Patrick Welche wrote:
> On Mon, Sep 22, 2003 at 11:33:30AM -0400, Tom Lane wrote:
> > Patrick Welche <prlw1@newn.cam.ac.uk> writes:
> > > I hope I guessed the right syntax...
> > > % pg_filedump -R 71716 data/base/17148/283342
> > 
> > Yes, but this doesn't give all the available info.  Add -i and -f
> > options.  A plain -d dump might be interesting too.
> 
> Indeed, the plain -d dump says that I have a chunk of /var/mail/prlw1
> in 1000-13ff. No wonder postgres complained!

For the record, what filesystem is this on?  Is it ReiserFS by any
chance?

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Aprende a avergonzarte mas ante ti que ante los demas" (Democrito)


Re: missing pg_clog files ?

From
Patrick Welche
Date:
On Mon, Sep 22, 2003 at 02:21:43PM -0400, Alvaro Herrera wrote:
> On Mon, Sep 22, 2003 at 05:03:28PM +0100, Patrick Welche wrote:
> > On Mon, Sep 22, 2003 at 11:33:30AM -0400, Tom Lane wrote:
> > > Patrick Welche <prlw1@newn.cam.ac.uk> writes:
> > > > I hope I guessed the right syntax...
> > > > % pg_filedump -R 71716 data/base/17148/283342
> > > 
> > > Yes, but this doesn't give all the available info.  Add -i and -f
> > > options.  A plain -d dump might be interesting too.
> > 
> > Indeed, the plain -d dump says that I have a chunk of /var/mail/prlw1
> > in 1000-13ff. No wonder postgres complained!
> 
> For the record, what filesystem is this on?  Is it ReiserFS by any
> chance?

Nope, ffs with soft dependencies, on a 1 month old IDE drive (read not
yet known good...) Hmm maybe I'd better ask over on the NetBSD list,
though I think my kernel is from just before ide rototill which was actually
on atapi rather than straight ide AFAICT, and certainly before gcc 3.3.1,
so the duff hardware scenario seems best..

Cheers,

Patrick


Re: missing pg_clog files ?

From
Bruce Momjian
Date:
Tom Lane wrote:
> Patrick Welche <prlw1@newn.cam.ac.uk> writes:
> > Indeed, the plain -d dump says that I have a chunk of /var/mail/prlw1
> > in 1000-13ff. No wonder postgres complained!
> 
> Yipes.  We have seen this sort of thing once or twice in the past.
> I don't know whether you are looking at a disk drive fault (dropping
> the right data onto the wrong sector) or a kernel fault (getting
> confused about which buffer holds which file block), but I believe
> it's one or the other.  It's hard to see how anything at the application
> level could have gotten those two files switched.  You might look to see
> if there are any updates available for your kernel.
> 
> As for recovery, you probably want to drop and reload at least that one
> table.  Whether it's worth a complete reload is your call.

Or just delete the row with that tid and see if you can access the other
data.  You might be able to get away with just restoring that one row
from backup.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073