Thread: pg_clog (?) problem with VACUMM

pg_clog (?) problem with VACUMM

From
Gerardo Herzig
Date:
Hi all. Well, im having this problem for the first time.
When executing a VACUUM, i have this msg:
gse_new_version=# VACUUM ;
ERROR:  could not access status of transaction 118
DETAIL:  could not open file "pg_clog/0000": No existe el fichero o el 
directorio

Ok, wtf, i will create the file. So touch pg_clog/0000 and VACUUM again....
ERROR:  could not access status of transaction 118
DETAIL:  could not read from file "pg_clog/0000" at offset 0: Conseguido

I hope the error will still be clear due to the word "Conseguido", wich 
will be something like 'acomplished', or 'obtained'

The database was created making a "with template=gse" modifiers. 
Vacuuming the 'gse' database gives me no errors at all.

Here is the final lines from strace when triyng to vacuum...
lseek(20, 0, SEEK_SET)                  = 0
read(20, "", 8192)                      = 0
close(20)                               = 0
rt_sigprocmask(SIG_SETMASK, ~[QUIT ILL TRAP ABRT BUS FPE KILL SEGV CONT 
STOP SYS RTMIN RT_1], NULL, 8) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=377, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=377, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=377, ...}) = 0
write(2, "<2007-11-27 16:54:23 ART, gse_ne"..., 346) = 346
sendto(8, "E\0\0\0\240SERROR\0CXX000\0Mcould not ac"..., 161, 0, NULL, 
0) = 161
munmap(0x2aaabffa9000, 839680)          = 0
munmap(0x2aaabdfa8000, 33558528)        = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
sendto(7, "\4\0\0\0\310\3\0\0\1\0\0\0\2731\0\0 S\255\0\r\0\0\0+\0"..., 
968, 0, NULL, 0) = 968
sendto(7, "\4\0\0\0\310\3\0\0\1\0\0\0\2731\0\0 S\255\0\r\0\0\0\0\0"..., 
968, 0, NULL, 0) = 968
sendto(7, "\4\0\0\0\310\3\0\0\1\0\0\0\2731\0\0 S\255\0\r\0\0\0\0\0"..., 
968, 0, NULL, 0) = 968
sendto(7, "\4\0\0\0\310\3\0\0\1\0\0\0\2731\0\0 S\255\0\r\0\0\0\0\0"..., 
968, 0, NULL, 0) = 968
sendto(7, "\4\0\0\0\370\0\0\0\1\0\0\0\2731\0\0 S\255\0\3\0\0\0\0\0"..., 
248, 0, NULL, 0) = 248
sendto(7, "\4\0\0\0008\3\0\0\1\0\0\0\2731\0\0\0\0\0\0\v\0\0\0\0\0"..., 
824, 0, NULL, 0) = 824
sendto(7, "\3\0\0\0\27\0\0\0\1\0\0\0\2731\0\0<IDLE>\0", 23, 0, NULL, 0) = 23
sendto(8, "Z\0\0\0\5I", 6, 0, NULL, 0)  = 6
recvfrom(8,   

Dont know what to do now.
Wait...i have some more.
When vacuuming a table at a time, i got the table making vacuum crash. 
It's this one:
gse_new_version=# \d words            Table "public.words"   Column     |       Type        | Modifiers
---------------+-------------------+-----------page_id       | integer           |word          | character varying
|word_position| integer           |original_word | character varying |
 
Indexes:   "words_idx" btree (upper(word::text))   "words_page_id_idx" btree (page_id)
"words_page_id_word_position_id"btree (page_id, word_position) CLUSTER   "words_upper_idx" btree (upper(word::text)
varchar_pattern_ops)

Im souspiciuos about the CLUSTER (dont ask why :-). Meanwhile im looking 
for 'unclustering' tables and triyng again.

Server version: 8.1.2
On SuSE 9.1

Some other ideas will be apreciated.
Thanks!!

Gerardo.


Re: pg_clog (?) problem with VACUMM

From
Gregory Stark
Date:
"Gerardo Herzig" <gherzig@fmed.uba.ar> writes:

> Hi all. Well, im having this problem for the first time.
> When executing a VACUUM, i have this msg:
> gse_new_version=# VACUUM ;
> ERROR:  could not access status of transaction 118
> DETAIL:  could not open file "pg_clog/0000": No existe el fichero o el
> directorio

This is the typical error you see if the data is corrupted by bad memory or
disk. It's trying to check if the record was committed or aborted and finding
a bogus transactionid.

(Except in this case I'm not sure that the transaction id is necessarily
bogus. It's awfully early though. Was the table in the template database
created soon after running initdb? And then you're creating a copy of the
database much later?)

It's also possible it's one of the various bugs fixed in the bug-fix releases
8.1.3 through 8.1.10. I don't see any related to clog but there are some
related to vacuum which could cause data corruption.

But aside from upgrading immediately to 8.1.10 I would suggest you run
memtest86 (assuming it's a x86 machine) or some other memory checker. Bad
memory is quite common and could easily cause this. Also run some SMART tool
to check the disks.


> Ok, wtf, i will create the file. So touch pg_clog/0000 and VACUUM again....
> ERROR:  could not access status of transaction 118
> DETAIL:  could not read from file "pg_clog/0000" at offset 0: Conseguido

That's almost the right idea for a last ditch attempt to extract what data you
can from a corrupted table. You have to fill the file with nul bytes though.
Something like dd if=/dev/zero of=0000 bs=1k count=nnn where nnn is, uh, I'm
not sure how large, it won't take much to cover transactionid 118 though.


--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production
Tuning


Re: pg_clog (?) problem with VACUMM

From
Gerardo Herzig
Date:
Gregory Stark wrote:

>"Gerardo Herzig" <gherzig@fmed.uba.ar> writes:
>
>  
>
>>Hi all. Well, im having this problem for the first time.
>>When executing a VACUUM, i have this msg:
>>gse_new_version=# VACUUM ;
>>ERROR:  could not access status of transaction 118
>>DETAIL:  could not open file "pg_clog/0000": No existe el fichero o el
>>directorio
>>    
>>
>
>[]
>
>  
>
>>Ok, wtf, i will create the file. So touch pg_clog/0000 and VACUUM again....
>>ERROR:  could not access status of transaction 118
>>DETAIL:  could not read from file "pg_clog/0000" at offset 0: Conseguido
>>    
>>
>
>That's almost the right idea for a last ditch attempt to extract what data you
>can from a corrupted table. You have to fill the file with nul bytes though.
>Something like dd if=/dev/zero of=0000 bs=1k count=nnn where nnn is, uh, I'm
>not sure how large, it won't take much to cover transactionid 118 though.
>
>  
>

Mmmm, yeah, after dd'ing the 0000 file, VACUUM execute just fine!!
BTW, a previous atempt to build a CLUSTER gives me the same error as 
with VACUUM, so i dont think it was a VACUUM issue.

Thanks Gregor for your help!!

Gerardo


Re: pg_clog (?) problem with VACUMM

From
"Stefan Becker"
Date:
Dear Gerardo and Pgsql-List,

>When executing a VACUUM, i have this msg:
>gse_new_version=# VACUUM ;
>ERROR:  could not access status of transaction 118
>DETAIL:  could not open file "pg_clog/0000": No existe el fichero o el

I had a similar problem.  I decided one day to "clean" out
The pg_clog/ directory of the database cluster.  These files'
"last modified time Stamps" on many files were quite 
old, so I deleted them.  This didn't have any consequences for
the data (dumps or indexes etc.) or the operation of 
the installation except, as I found out MUCH later found out
- I could no longer vacuum the Database.

I restored the missing files from a backup and everything
was okay again, after a vacuum of the database the
system cleaned out the older pg_log/ files on its own.

I hope you get things fixed!

My best regards,

Stefan Becker
Email: Stefan@yukonho.de