Thread: 8.4.0 data loss / HOT-related bug
Hello, 8.4.0 installation on i686 modern Linux, a simple production app with ~4 users entering data; normalised, no weird stuff happening there. Recently: ERROR: missing chunk number 0 for toast value 120741 in pg_toast_29644 This table has table_log and a "before update" trigger setting "NEW.last_modified = now()". The broken toast is a HTML blob, stored in a "text" column. This particular row currently has last_modified="2009-08-11", but table_log says it has been updated on "2009-08-12" -- so, data loss. Please see the attachment for the xmin,xmax,cmin,cmax,ctid values of these rows. Andrew Gireth has asked for the hexdump of page 539 of the "promocje" table: http://radek.cc/promocje.hexdump About a week after I've upgraded to 8.4, the users of this application have started to complain, that some rows are disappearing. -- Radosław Zieliński <radek@pld-linux.org>
Attachment
On Fri, Aug 21, 2009 at 4:04 PM, Radoslaw Zielinski<radek@pld-linux.org> wr= ote: > This particular row currently has last_modified=3D"2009-08-11", but table= _log > says it has been updated on "2009-08-12" -- so, data loss. These are all very low transaction ids. you say it was a week of production before this happened? How did you migrate your data? Was it a normal pg_dump restore? Or did you do anything funny with slony or pg_migrate? > Please see the attachment for the xmin,xmax,cmin,cmax,ctid values of > these rows. =A0Andrew Gireth has asked for the hexdump of page 539 of the > "promocje" table: http://radek.cc/promocje.hexdump Here's a dump of that page btw. postgres=3D# select (h).* from (select heap_page_items(p) as h from p) as = x; lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid ----+--------+----------+--------+--------+--------+----------+----------+-= ------------+------------+--------+----------------------------------------= --+------- 1 | 6896 | 1 | 1296 | 3747 | 0 | 0 | (539,1) | 15 | 10499 | 28 | 1011111000001110000000000000000000000000 | 2 | 5 | 2 | 0 | | | | | | | | | 3 | 1164 | 1 | 504 | 3842 | 3843 | 0 | (539,3) | 15 | 2439 | 28 | 1011111000001110000000000000000000000000 | 4 | 6396 | 1 | 500 | 3806 | 0 | 0 | (539,4) | 15 | 10503 | 28 | 1011111000001110000000000000000000000000 | 5 | 5928 | 1 | 468 | 3810 | 0 | 0 | (539,5) | -32753 | 10503 | 28 | 1011111000001110000000000000000000000000 | 6 | 5176 | 1 | 752 | 3812 | 3813 | 0 | (539,6) | 15 | 2439 | 28 | 1011111000001110000000000000000000000000 | 7 | 4600 | 1 | 576 | 3819 | 6179 | 0 | (539,12) | 16399 | 10503 | 28 | 1011101000001110000000000000000000000000 | 8 | 4076 | 1 | 524 | 3823 | 3825 | 0 | (539,8) | 15 | 2439 | 28 | 1011111000001110000000000000000000000000 | 9 | 2872 | 1 | 1204 | 3830 | 0 | 0 | (539,9) | 15 | 10499 | 28 | 1011111000001110000000000000000000000000 | 10 | 1668 | 1 | 1204 | 3831 | 3832 | 0 | (539,10) | 15 | 2435 | 28 | 1011111000001110000000000000000000000000 | 11 | 0 | 3 | 0 | | | | | | | | | 12 | 0 | 0 | 0 | | | | | | | | | (12 rows) --=20 greg http://mit.edu/~gsstark/resume.pdf
On Fri, Aug 21, 2009 at 5:03 PM, Greg Stark<gsstark@mit.edu> wrote: > On Fri, Aug 21, 2009 at 4:04 PM, Radoslaw Zielinski<radek@pld-linux.org> wrote: > > Here's a dump of that page btw. ... [crap word-wrapped by gmail] Sorry, clearly gmail sucks for posting things that you actually care about how they're formatted. I've attached the page dump. It looks like the row *was* updated by transaction 6179 and the new version was stored in line pointer 12. However it's marked XMAX_INVALID which means at least somebody at some point thought 6179 had aborted and marked that hint bit. Do we have any tools to look up a transaction id status in the clog? Or alternatively do you have all the wal logs and could you run xlogdump on them looking for transaction id 6179? -- greg http://mit.edu/~gsstark/resume.pdf
Attachment
Radoslaw Zielinski wrote: > Hello, > > 8.4.0 installation on i686 modern Linux, a simple production app with ~4 > users entering data; normalised, no weird stuff happening there. Recently: > > ERROR: missing chunk number 0 for toast value 120741 in pg_toast_29644 > bug=# select id, last_modified, xmin,xmax,cmin,cmax,ctid from public.promocje where id = 4300; > id | last_modified | xmin | xmax | cmin | cmax | ctid > ------+----------------------------+------+------+------+------+--------- > 4300 | 2009-08-11 09:54:11.007351 | 3819 | 6179 | 0 | 0 | (539,7) > (1 row) I wonder if this could be explained by xid=6179 not marked as committed in clog. I'd try flipping that bit and see what happens ... -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Greg Stark wrote: > It looks like the row *was* updated by transaction 6179 and the new > version was stored in line pointer 12. However it's marked > XMAX_INVALID which means at least somebody at some point thought 6179 > had aborted and marked that hint bit. Hmm, but LP 12 shouldn't have len=0 then, no? Unless it has been clobbered by vacuum or HOT pruning, I guess ... -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > I wonder if this could be explained by xid=6179 not marked as committed > in clog. I'd try flipping that bit and see what happens ... Once the row's been marked XMAX_INVALID, changing clog won't change that marking ... regards, tom lane
On Fri, Aug 21, 2009 at 5:15 PM, Alvaro Herrera<alvherre@commandprompt.com> wrote: > I wonder if this could be explained by xid=3D6179 not marked as committed > in clog. =A0I'd try flipping that bit and see what happens ... Well nothing's going to help much now. Firstly, once the hint bit gets set nothing second-guesses that and checks the clog anyways. And secondly the new version of the tuple is already vacuumed. Either of two things are true. Either transaction 6179 committed, which would explain why the toast tuples are missing. In which case sometime later this hint bit became set and the new version pruned. I don't know if bad memory could cause all that to happen, would the HOT pruning logic see the hint bit and decide to prune based on that? I suppose a bad bit hitting the clog could cause everything though. Alternatively 6179 aborted but somebody along the way got that wrong and marked the toast tuples dead (and maybe vacuumed them) thinking it had committed. That's going to be harder to tell if that's what happened because we don't have any pointers to the specific page in the toast table. Not unless you can dump the whole index and find pointers in there or can find the details in the wal log. --=20 greg http://mit.edu/~gsstark/resume.pdf
Greg Stark <gsstark@mit.edu> [2009-08-21 18:03]: > On Fri, Aug 21, 2009 at 4:04 PM, Radoslaw Zielinski<radek@pld-linux.org> = wrote: >> This particular row currently has last_modified=3D"2009-08-11", but tabl= e_log >> says it has been updated on "2009-08-12" -- so, data loss. > These are all very low transaction ids. you say it was a week of > production before this happened? Upgraded around: $ rpm -q postgresql --qf '%{INSTALLTIME}' | xargs perl -le 'print~~localt= ime shift' Fri Aug 7 10:06:44 2009 The first mention of the error in the app's output: [info] *** Request 5532 (0.008/s) [7283] [Wed Aug 19 13:06:35 2009] *** User has reported "disappearing rows" at 12 Aug 2009, 16:58. > How did you migrate your data? Was it a normal pg_dump restore? Or did > you do anything funny with slony or pg_migrate? Plain regular dump/restore (from text dump). There's no slony or anything on this database except for table_log and the trigger to update last_modified. In case it might be relevant: locale C, encoding utf8. --=20 Rados=C5=82aw Zieli=C5=84ski <radek@pld-linux.org>
Greg Stark <gsstark@mit.edu> [2009-08-21 18:12]: [...] > Or alternatively do you have all the wal logs and could you run > xlogdump on them looking for transaction id 6179? Nope, I don't have the wal logs. --=20 Rados=C5=82aw Zieli=C5=84ski <radek@pld-linux.org>
2009/8/21 Radoslaw Zielinski <radek@pld-linux.org>: > Greg Stark <gsstark@mit.edu> [2009-08-21 18:03]: >> On Fri, Aug 21, 2009 at 4:04 PM, Radoslaw Zielinski<radek@pld-linux.org>= wrote: >>> This particular row currently has last_modified=3D"2009-08-11", but tab= le_log >>> says it has been updated on "2009-08-12" -- so, data loss. > >> These are all very low transaction ids. you say it was a week of >> production before this happened? > > Upgraded around: > =A0$ rpm -q postgresql --qf '%{INSTALLTIME}' | xargs perl -le 'print~~loc= altime shift' > =A0Fri Aug =A07 10:06:44 2009 Is this, perchance, new hardware? Did you test the memory in it? Do you have other records that have "disappeared"? --=20 greg http://mit.edu/~gsstark/resume.pdf
Greg Stark <gsstark@mit.edu> [2009-08-21 18:55]: [...] > Is this, perchance, new hardware? Did you test the memory in it? It's a "virtual private server"; the hosting provider is swearing everything's fine. I can't vouch for it myself, obviously. > Do you have other records that have "disappeared"? I haven't found any (yet). So far, verified with: create function exec(text) returns void language plpgsql as $f$ begin execu= te $1; end; $f$; select exec( 'select log_id, id from ( select distinct ''' || relname || ''', case fir= st_value(trigger_mode) over w when ''DELETE'' then null else id end as log_= id, first_value(last_modified) over w as lm from log.' || relname || ' wind= ow w as (partition by id order by id desc,trigger_id desc ) )x left join pu= blic.' || relname || ' on id=3Dlog_id where ( id is null or last_modified <= > x.lm ) and log_id is not null' ) from pg_stat_user_tables where schemaname =3D 'log'; (Looks at the log.* table, finds the largest last_modified of the non-deleted rows, joins the public.* table on id, compares last_modified). --=20 Rados=C5=82aw Zieli=C5=84ski <radek@pld-linux.org>
Greg Stark <gsstark@mit.edu> [2009-08-21 18:38]: [...] > the toast table. Not unless you can dump the whole index and find > pointers in there or can find the details in the wal log. You mean the primary key index? How do I dump it? --=20 Rados=C5=82aw Zieli=C5=84ski <radek@pld-linux.org>
2009/8/21 Radoslaw Zielinski <radek@pld-linux.org>: > Greg Stark <gsstark@mit.edu> [2009-08-21 18:38]: > [...] >> the toast table. Not unless you can dump the whole index and find >> pointers in there or can find the details in the wal log. > > You mean the primary key index? =A0How do I dump it? Actually I mean the key for the toast table. Let me ask firstly do you get anything if you just do select * from pg_toast.pg_toast_29644 where chunk_id =3D 120741 ? And secondly, what do you get if you do "select length(htmlblob) from tab where id =3D ?" which should work even if the toast data is broken. To dump the index -- this will be tedious, but: Load the pageinspect contrib module and run this: select (x).* from (select bt_page_items('t_pkey',1) as x ) as y; select (x).* from (select bt_page_items('t_pkey',2) as x ) as y; ... Until you get an error. Then look through the output for any pointers The hex at the end is the index key which in the case of a toast table starts with the chunk id, you're looking for 120741 which is 0001D7A5 in hex or A5D70100 in little-endian... If you find any such rows send them and they might tell us what's going on or where to look for the missing toast records to see what's going on. However I'm kind of skeptical that you'll find any. IIRC Btree marks pointers that it finds point to dead records and reuses them as soon as possible. Still, maybe nothing's been inserted on that page yet. --=20 greg http://mit.edu/~gsstark/resume.pdf
Radoslaw Zielinski <radek@pld-linux.org> writes: > Greg Stark <gsstark@mit.edu> [2009-08-21 18:55]: >> Is this, perchance, new hardware? Did you test the memory in it? > It's a "virtual private server"; the hosting provider is swearing > everything's fine. I can't vouch for it myself, obviously. Hmmm ... have you had any crashes? I'd wonder how trustworthy fsync is on such a setup. regards, tom lane
Greg Stark <gsstark@mit.edu> [2009-08-21 20:30]: [...] > Actually I mean the key for the toast table. > Let me ask firstly do you get anything if you just do select * from > pg_toast.pg_toast_29644 where chunk_id =3D 120741 ? 0 rows. > And secondly, what do you get if you do "select length(htmlblob) from > tab where id =3D ?" which should work even if the toast data is broken. bug=3D# select length(opis_szczeg) from promocje where id =3D 4300; ERROR: missing chunk number 0 for toast value 120741 in pg_toast_29644 > To dump the index -- this will be tedious, but: $ seq 11 | xargs -ti psql bug -c "select {}, (x).* from (select bt_page_i= tems('promocje_pkey',{}) as x ) as y" [...] psql bug -c select 11, (x).* from (select bt_page_items('promocje_pkey',1= 1) as x ) as y=20 ERROR: block number out of range Output: http://radek.cc/promocje_bpi.txt.gz [...] > If you find any such rows send them and they might tell us what's > going on or where to look for the missing toast records to see what's > going on. No luck. --=20 Rados=C5=82aw Zieli=C5=84ski <radek@pld-linux.org>
Tom Lane <tgl@sss.pgh.pa.us> [2009-08-21 20:35]: > Radoslaw Zielinski <radek@pld-linux.org> writes: >> Greg Stark <gsstark@mit.edu> [2009-08-21 18:55]: >>> Is this, perchance, new hardware? Did you test the memory in it? >> It's a "virtual private server"; the hosting provider is swearing >> everything's fine. I can't vouch for it myself, obviously. > Hmmm ... have you had any crashes? I'd wonder how trustworthy fsync is > on such a setup. Nope (at least nothing I have noticed). BTW: it's OpenVZ. --=20 Rados=C5=82aw Zieli=C5=84ski <radek@pld-linux.org>
2009/8/22 Radoslaw Zielinski <radek@pld-linux.org>: > =A0bug=3D# select length(opis_szczeg) from promocje where id =3D 4300; > =A0ERROR: =A0missing chunk number 0 for toast value 120741 in pg_toast_29= 644 Sorry, what datatype is this again? And what encoding? Perhaps I should have said octet_length() instead of length(). --=20 greg http://mit.edu/~gsstark/resume.pdf
2009/8/22 Radoslaw Zielinski <radek@pld-linux.org>: > =A0$ seq 11 | xargs -ti psql bug -c "select {}, (x).* from (select bt_pag= e_items('promocje_pkey',{}) as x ) as y" > =A0[...] > =A0psql bug -c select 11, (x).* from (select bt_page_items('promocje_pkey= ',11) as x ) as y > =A0ERROR: =A0block number out of range Sorry, I forgot to adjust the index name in the example. We don't want the primary key index of your table we want the index of the toast table. Try the same thing with bt_page_items('pg_toast.pg_toast_29644_index',...) --=20 greg http://mit.edu/~gsstark/resume.pdf
Greg Stark <gsstark@mit.edu> [2009-08-23 03:40]: > 2009/8/22 Radoslaw Zielinski <radek@pld-linux.org>: >> =C2=A0bug=3D# select length(opis_szczeg) from promocje where id =3D 4300; >> =C2=A0ERROR: =C2=A0missing chunk number 0 for toast value 120741 in pg_t= oast_29644 > Sorry, what datatype is this again? And what encoding? text, UTF-8 (locale C). > Perhaps I should have said octet_length() instead of length(). bug=3D# select octet_length(opis_szczeg) from promocje where id =3D 4300; octet_length=20 -------------- 3081 (1 row) --=20 Rados=C5=82aw Zieli=C5=84ski <radek@pld-linux.org>
Greg Stark <gsstark@mit.edu> [2009-08-23 03:44]: > 2009/8/22 Radoslaw Zielinski <radek@pld-linux.org>: >> =C2=A0$ seq 11 | xargs -ti psql bug -c "select {}, (x).* from (select bt= _page_items('promocje_pkey',{}) as x ) as y" >> =C2=A0[...] >> =C2=A0psql bug -c select 11, (x).* from (select bt_page_items('promocje_= pkey',11) as x ) as y >> =C2=A0ERROR: =C2=A0block number out of range > Sorry, I forgot to adjust the index name in the example. We don't want > the primary key index of your table we want the index of the toast > table. Sorry, I got confused by the index not being shown in \d pg_toast.pg_toast_= 29644. > Try the same thing with bt_page_items('pg_toast.pg_toast_29644_index',...) bug=3D# select * from bt_page_items('pg_toast.pg_toast_29644_index', 8) whe= re data ~ 'a5 d7 01 00'; -[ RECORD 1 ]----------------------- itemoffset | 121 ctid | (458,3) itemlen | 16 nulls | f vars | f data | a5 d7 01 00 00 00 00 00 Full output, without the WHERE: http://radek.cc/promocje_bpi2.txt.gz get_raw_page( 'pg_toast.pg_toast_29644', 458 ): http://radek.cc/toast_29644= _page_458.txt bug=3D# select * from heap_page_items( get_raw_page( 'pg_toast.pg_toast_296= 44', 458 ) ); lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t= _infomask2 | t_infomask | t_hoff | t_bits | t_oid=20 ----+--------+----------+--------+--------+--------+----------+---------+--= -----------+------------+--------+--------+------- 1 | 6156 | 1 | 2036 | 3816 | 0 | 1 | (458,1) | = 3 | 2306 | 24 | |=20=20=20=20=20=20 2 | 4176 | 1 | 1979 | 3816 | 0 | 1 | (458,2) | = 3 | 2306 | 24 | |=20=20=20=20=20=20 3 | 0 | 3 | 0 | | | | | = | | | |=20=20=20=20=20=20 4 | 2140 | 1 | 2036 | 3823 | 0 | 0 | (458,4) | = 3 | 2306 | 24 | |=20=20=20=20=20=20 (4 rows) bug=3D# select * from bt_page_stats('pg_toast.pg_toast_29644_index', 8); -[ RECORD 1 ]-+----- blkno | 8 type | l live_items | 366 dead_items | 1 avg_item_size | 16 page_size | 8192 free_size | 808 btpo_prev | 7 btpo_next | 9 btpo | 0 btpo_flags | 65 --=20 Rados=C5=82aw Zieli=C5=84ski <radek@pld-linux.org>
>>>>> "Greg" == Greg Stark <gsstark@mit.edu> writes: Greg> Either of two things are true. Greg> Either transaction 6179 committed, [snip] This is all missing the point. The row should have been killed by transaction 4971, NOT 6179. By the time transaction 6179 tried to do anything with it, it was almost certainly already broken (or possibly 6179 broke it). Notice that in the log table, the log entry that records the most recent update to the row is the one with xmin=4971. There is no entry in the log table corresponding to 6179. -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > "Greg" == Greg Stark <gsstark@mit.edu> writes: > Greg> Either of two things are true. > Greg> Either transaction 6179 committed, > [snip] > This is all missing the point. The row should have been killed by > transaction 4971, NOT 6179. By the time transaction 6179 tried to > do anything with it, it was almost certainly already broken (or > possibly 6179 broke it). If there have been any crashes in this database, the problem might be a variant of Jeff Ross' issue --- to wit, the row killed by transaction 4971 actually is dead (and its associated toast items have been removed), but that row is still found by seqscans because of an incorrect PD_ALL_VISIBLE flag. regards, tom lane