Re: pgsql 8.3.3 "Invalid page header" leads to "FATAL: segment too big" - Mailing list pgsql-admin
From | Achilleas Mantzios |
---|---|
Subject | Re: pgsql 8.3.3 "Invalid page header" leads to "FATAL: segment too big" |
Date | |
Msg-id | 201004231206.38741.achill@matrix.gatewaynet.com Whole thread Raw |
In response to | Re: pgsql 8.3.3 "Invalid page header" leads to "FATAL: segment too big" (Cédric Villemain <cedric.villemain.debian@gmail.com>) |
List | pgsql-admin |
Στις Thursday 22 April 2010 19:02:00 ο/η Cédric Villemain έγραψε: > 2010/4/22 Achilleas Mantzios <achill@matrix.gatewaynet.com>: > > Στις Thursday 22 April 2010 16:53:05 ο/η Cédric Villemain έγραψε: > >> 2010/4/22 Achilleas Mantzios <achill@matrix.gatewaynet.com>: > >> > Hello, > >> > i have this serious problem in one of our remote vessels. (comm is done by minicom to the remote satelite modem) > >> > I think that this server was under some sort of constant resets or hardware failures. > >> > Initially,i had this problem: > >> > ERROR: invalid page header in block 672720 of relation "pg_toast_125716009" > >> > > >> > This toast table corresponds to a table named "mail_message", > >> > Table "public.mail_message" > >> > Column | Type | Modifiers > >> > -----------+-------------------+----------------------------------------------------------- > >> > msgno | mail_msgno_domain | not null default nextval('mail_msgno_sequence'::regclass) > >> > msgsource | bytea | > >> > Indexes: > >> > "mail_message_key" PRIMARY KEY, btree (msgno) > >> > > >> > (obviously the TOAST table serves the msgsource varlena) the contents of which is not of vital importance. > >> > I tried, REINDEXING, with no success, and after that, i tried resetting the said block on disk as per this > >> > suggestion by Tom here: http://old.nabble.com/invalid-page-header-td11981154.html > >> > > >> > i found the oid of the table: > >> > SELECT tableoid,1 from pg_toast.pg_toast_125716009 order by chunk_id LIMIT 1 > >> > tableoid | ?column? > >> > -----------+---------- > >> > 125716013 | 1 > >> > > >> > (and just to verify) > >> > SELECT relname from pg_class where oid=125716013; > >> > relname > >> > -------------------- > >> > pg_toast_125716009 > >> > > >> > Then i did: (as i said i do not need the contents of msgsource - yet the contents of msgno are vital) > >> > dd if=/dev/zero of=/usr/local/var/lib/pgsql/data/base/125714957/125716013 seek=672720 bs=8192 count=1 > >> > >> segment have 1.1GB size maximum. You have to catch in what segment the > >> faulty block is, and reajust the block value from the error report to > >> the real one in the good segment. > >> > > > > Thanx, > > Taking a look at /usr/local/src/postgresql-8.3.3/src/backend/storage/smgr/md.c > > i see the the error comes from function mdnblocks > > > > if (nblocks > ((BlockNumber) RELSEG_SIZE)) > > elog(FATAL, "segment too big"); > > > > That means, that some segment file is bigger than RELSEG_SIZE > > At least in my system: > > #define BLCKSZ 8192 > > #define RELSEG_SIZE (0x40000000 / BLCKSZ) > > So, any segment file cannot be bigger than RELSEG_SIZE blocks (or 2^30 bytes = 1GB) > > > > Currently i dont have any access to the machine but tomorrow i will check the file sizes. > > > > Can anyone shed some light as to some method of identifying all the segment files of a table? > > The first one has the same name as the tableoid. > > How about the subsequent segments? > > Your execution of dd make your first segment bigger than expected. > Other segment have the same name with a .1 .2 etc suffix. > > You have to shrink your first segment to the correct size. > check what happens, you should have now the original error. > > And, I have never used it, but I think it is the purpose of > zero_damaged_pages to parameter to allow postgresql itself to zero the > bad black. (reading > src/backend/storage/buffer/bufmgr.c confirm that. > > *BUT* take care that it will zero *every* bad page, perhaps not only > the one trapping an error. > > In those situation, it is good to make a snapshot of the pgdata > directory, in case your fingers surf too fast on the keyboard.... > > If you don't want to activate zero_damage_page, then go and calculate > which block in which segment you have to zeroing. > > side note, it may be usefull to have the relevant information in the > error message... > Many Thanks, Cédric Villemain and Tom What i did was first to correct the first segment file with smth like dd if=216293737 of=216293737.good seek=0 bs=8192 count=131072 which effectively truncates all but the first 131072 blocks (or 2^30 bytes = 1GB) After that was done, and restarting postgresql backend, then i fell back to the situation with the Invalid page header, as noted before. I stopped the backend and calculated the exact segment file and offset where the problem was: The block with the invalid header was the block with number: 672720 Now each segment contains at most 131072 blocks, with all but the last containing exactly 131072 blocks. So my problematic segment was the one with number: 672720 /131072 = 5 and the block offset inside this segment was: 672720 - (5*131072) = 17360 so i tried to zero that patricular block with dd conv=notrunc if=216293737.5 of=216293737.5.GOOD seek=17360 bs=8192 count=1 i started postgresql and i threw out a warning about initializing this zero page. After that, i reported error in header for block 672740, i repeated the same procedure, and i was able to go further and even reindex the whole database which went fine. I tried as the ultimate test (and a useful one at this point) to pg_dump the database. Unfortunately one table seems to be in error: the error is: pg_dump: SQL command failed pg_dump: Error message from server: ERROR: compressed data is corrupt pg_dump: The command was: COPY public.mail_entity (msgno, entno, entparentno.... I think the initial issue of this thread is solved, i'll come back with news on the pg_dump issue. > > > >> > > >> > However, after that, unfortunately i get constant postgresql server restarts with: > >> > FATAL: segment too big > >> > server closed the connection unexpectedly > >> > This probably means the server terminated abnormally > >> > before or while processing the request. > >> > The connection to the server was lost. Attempting reset: Succeeded. > >> > > >> > Is there anything i can do to savage the situation? > >> > > >> > (one of) the hard part here is that i dont have neither physical nor network access to the server > >> > (only ultra expensive unreliable satellite comms) > >> > > >> > Thanks for any hints... > >> > > >> > -- > >> > Achilleas Mantzios > >> > > >> > -- > >> > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) > >> > To make changes to your subscription: > >> > http://www.postgresql.org/mailpref/pgsql-admin > >> > > >> > >> > >> > > > > > > > > -- > > Achilleas Mantzios > > > > > > -- > Cédric Villemain > -- Achilleas Mantzios
pgsql-admin by date: