Thread: 8.4.0 data loss / HOT-related bug

8.4.0 data loss / HOT-related bug

From
Radoslaw Zielinski
Date:
Hello,

8.4.0 installation on i686 modern Linux, a simple production app with ~4
users entering data; normalised, no weird stuff happening there.  Recently:

  ERROR:  missing chunk number 0 for toast value 120741 in pg_toast_29644

This table has table_log and a "before update" trigger setting
"NEW.last_modified = now()".  The broken toast is a HTML blob, stored in
a "text" column.

This particular row currently has last_modified="2009-08-11", but table_log
says it has been updated on "2009-08-12" -- so, data loss.

Please see the attachment for the xmin,xmax,cmin,cmax,ctid values of
these rows.  Andrew Gireth has asked for the hexdump of page 539 of the
"promocje" table: http://radek.cc/promocje.hexdump


About a week after I've upgraded to 8.4, the users of this application
have started to complain, that some rows are disappearing.

--
Radosław Zieliński <radek@pld-linux.org>

Attachment

Re: 8.4.0 data loss / HOT-related bug

From
Greg Stark
Date:
On Fri, Aug 21, 2009 at 4:04 PM, Radoslaw Zielinski<radek@pld-linux.org> wr=
ote:

> This particular row currently has last_modified=3D"2009-08-11", but table=
_log
> says it has been updated on "2009-08-12" -- so, data loss.

These are all very low transaction ids. you say it was a week of
production before this happened?

How did you migrate your data? Was it a normal pg_dump restore? Or did
you do anything funny with slony or pg_migrate?

> Please see the attachment for the xmin,xmax,cmin,cmax,ctid values of
> these rows. =A0Andrew Gireth has asked for the hexdump of page 539 of the
> "promocje" table: http://radek.cc/promocje.hexdump

Here's a dump of that page btw.

postgres=3D# select (h).* from (select  heap_page_items(p) as h from p) as =
x;
 lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 |
t_ctid  | t_infomask2 | t_infomask | t_hoff |                  t_bits
                | t_oid
----+--------+----------+--------+--------+--------+----------+----------+-=
------------+------------+--------+----------------------------------------=
--+-------
  1 |   6896 |        1 |   1296 |   3747 |      0 |        0 |
(539,1)  |          15 |      10499 |     28 |
1011111000001110000000000000000000000000 |
  2 |      5 |        2 |      0 |        |        |          |
  |             |            |        |
          |
  3 |   1164 |        1 |    504 |   3842 |   3843 |        0 |
(539,3)  |          15 |       2439 |     28 |
1011111000001110000000000000000000000000 |
  4 |   6396 |        1 |    500 |   3806 |      0 |        0 |
(539,4)  |          15 |      10503 |     28 |
1011111000001110000000000000000000000000 |
  5 |   5928 |        1 |    468 |   3810 |      0 |        0 |
(539,5)  |      -32753 |      10503 |     28 |
1011111000001110000000000000000000000000 |
  6 |   5176 |        1 |    752 |   3812 |   3813 |        0 |
(539,6)  |          15 |       2439 |     28 |
1011111000001110000000000000000000000000 |
  7 |   4600 |        1 |    576 |   3819 |   6179 |        0 |
(539,12) |       16399 |      10503 |     28 |
1011101000001110000000000000000000000000 |
  8 |   4076 |        1 |    524 |   3823 |   3825 |        0 |
(539,8)  |          15 |       2439 |     28 |
1011111000001110000000000000000000000000 |
  9 |   2872 |        1 |   1204 |   3830 |      0 |        0 |
(539,9)  |          15 |      10499 |     28 |
1011111000001110000000000000000000000000 |
 10 |   1668 |        1 |   1204 |   3831 |   3832 |        0 |
(539,10) |          15 |       2435 |     28 |
1011111000001110000000000000000000000000 |
 11 |      0 |        3 |      0 |        |        |          |
  |             |            |        |
          |
 12 |      0 |        0 |      0 |        |        |          |
  |             |            |        |
          |
(12 rows)


--=20
greg
http://mit.edu/~gsstark/resume.pdf

Re: 8.4.0 data loss / HOT-related bug

From
Greg Stark
Date:
On Fri, Aug 21, 2009 at 5:03 PM, Greg Stark<gsstark@mit.edu> wrote:
> On Fri, Aug 21, 2009 at 4:04 PM, Radoslaw Zielinski<radek@pld-linux.org> wrote:
>
> Here's a dump of that page btw.
... [crap word-wrapped by gmail]

Sorry, clearly gmail sucks for posting things that you actually care
about how they're formatted. I've attached the page dump.

It looks like the row *was* updated by transaction 6179 and the new
version was stored in line pointer 12. However  it's marked
XMAX_INVALID which means at least somebody at some point thought 6179
had aborted and marked that hint bit.

Do we have any tools to look up a transaction id  status in the clog?
Or alternatively do you have all the wal logs and could you run
xlogdump on them looking for transaction id 6179?

--
greg
http://mit.edu/~gsstark/resume.pdf

Attachment

Re: 8.4.0 data loss / HOT-related bug

From
Alvaro Herrera
Date:
Radoslaw Zielinski wrote:
> Hello,
>
> 8.4.0 installation on i686 modern Linux, a simple production app with ~4
> users entering data; normalised, no weird stuff happening there.  Recently:
>
>   ERROR:  missing chunk number 0 for toast value 120741 in pg_toast_29644

> bug=# select id, last_modified, xmin,xmax,cmin,cmax,ctid from public.promocje where id = 4300;
>   id  |       last_modified        | xmin | xmax | cmin | cmax |  ctid
> ------+----------------------------+------+------+------+------+---------
>  4300 | 2009-08-11 09:54:11.007351 | 3819 | 6179 |    0 |    0 | (539,7)
> (1 row)

I wonder if this could be explained by xid=6179 not marked as committed
in clog.  I'd try flipping that bit and see what happens ...

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: 8.4.0 data loss / HOT-related bug

From
Alvaro Herrera
Date:
Greg Stark wrote:

> It looks like the row *was* updated by transaction 6179 and the new
> version was stored in line pointer 12. However  it's marked
> XMAX_INVALID which means at least somebody at some point thought 6179
> had aborted and marked that hint bit.

Hmm, but LP 12 shouldn't have len=0 then, no?  Unless it has been
clobbered by vacuum or HOT pruning, I guess ...

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: 8.4.0 data loss / HOT-related bug

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> I wonder if this could be explained by xid=6179 not marked as committed
> in clog.  I'd try flipping that bit and see what happens ...

Once the row's been marked XMAX_INVALID, changing clog won't change that
marking ...

            regards, tom lane

Re: 8.4.0 data loss / HOT-related bug

From
Greg Stark
Date:
On Fri, Aug 21, 2009 at 5:15 PM, Alvaro
Herrera<alvherre@commandprompt.com> wrote:
> I wonder if this could be explained by xid=3D6179 not marked as committed
> in clog. =A0I'd try flipping that bit and see what happens ...

Well nothing's going to help much now. Firstly, once the hint bit gets
set nothing second-guesses that and checks the clog anyways. And
secondly the new version of the tuple is already vacuumed.

Either of two things are true.

Either transaction 6179 committed, which would explain why the toast
tuples are missing. In which case sometime later this hint bit became
set and the new version pruned. I don't know if bad memory could cause
all that to happen, would the HOT pruning logic see the hint bit and
decide to prune based on that? I suppose a bad bit hitting the clog
could cause everything though.

Alternatively 6179 aborted but somebody along the way got that wrong
and marked the toast tuples dead (and maybe vacuumed them) thinking it
had committed. That's going to be harder to tell if that's what
happened because we don't have any pointers to the specific page in
the toast table. Not unless you can dump the whole index and find
pointers in there or can find the details in the wal log.

--=20
greg
http://mit.edu/~gsstark/resume.pdf

Re: 8.4.0 data loss / HOT-related bug

From
Radoslaw Zielinski
Date:
Greg Stark <gsstark@mit.edu> [2009-08-21 18:03]:
> On Fri, Aug 21, 2009 at 4:04 PM, Radoslaw Zielinski<radek@pld-linux.org> =
wrote:
>> This particular row currently has last_modified=3D"2009-08-11", but tabl=
e_log
>> says it has been updated on "2009-08-12" -- so, data loss.

> These are all very low transaction ids. you say it was a week of
> production before this happened?

Upgraded around:
  $ rpm -q postgresql --qf '%{INSTALLTIME}' | xargs perl -le 'print~~localt=
ime shift'
  Fri Aug  7 10:06:44 2009

The first mention of the error in the app's output:
  [info] *** Request 5532 (0.008/s) [7283] [Wed Aug 19 13:06:35 2009] ***

User has reported "disappearing rows" at 12 Aug 2009, 16:58.

> How did you migrate your data? Was it a normal pg_dump restore? Or did
> you do anything funny with slony or pg_migrate?

Plain regular dump/restore (from text dump).  There's no slony or
anything on this database except for table_log and the trigger to update
last_modified.

In case it might be relevant: locale C, encoding utf8.

--=20
Rados=C5=82aw Zieli=C5=84ski <radek@pld-linux.org>

Re: 8.4.0 data loss / HOT-related bug

From
Radoslaw Zielinski
Date:
Greg Stark <gsstark@mit.edu> [2009-08-21 18:12]:
[...]
> Or alternatively do you have all the wal logs and could you run
> xlogdump on them looking for transaction id 6179?

Nope, I don't have the wal logs.

--=20
Rados=C5=82aw Zieli=C5=84ski <radek@pld-linux.org>

Re: 8.4.0 data loss / HOT-related bug

From
Greg Stark
Date:
2009/8/21 Radoslaw Zielinski <radek@pld-linux.org>:
> Greg Stark <gsstark@mit.edu> [2009-08-21 18:03]:
>> On Fri, Aug 21, 2009 at 4:04 PM, Radoslaw Zielinski<radek@pld-linux.org>=
 wrote:
>>> This particular row currently has last_modified=3D"2009-08-11", but tab=
le_log
>>> says it has been updated on "2009-08-12" -- so, data loss.
>
>> These are all very low transaction ids. you say it was a week of
>> production before this happened?
>
> Upgraded around:
> =A0$ rpm -q postgresql --qf '%{INSTALLTIME}' | xargs perl -le 'print~~loc=
altime shift'
> =A0Fri Aug =A07 10:06:44 2009

Is this, perchance, new hardware? Did you test the memory in it?

Do you have other records that have "disappeared"?

--=20
greg
http://mit.edu/~gsstark/resume.pdf

Re: 8.4.0 data loss / HOT-related bug

From
Radoslaw Zielinski
Date:
Greg Stark <gsstark@mit.edu> [2009-08-21 18:55]:
[...]
> Is this, perchance, new hardware? Did you test the memory in it?

It's a "virtual private server"; the hosting provider is swearing
everything's fine.  I can't vouch for it myself, obviously.

> Do you have other records that have "disappeared"?

I haven't found any (yet).  So far, verified with:


create function exec(text) returns void language plpgsql as $f$ begin execu=
te $1; end; $f$;
select exec(
  'select log_id, id from ( select distinct ''' || relname || ''', case fir=
st_value(trigger_mode) over w when ''DELETE'' then null else id end as log_=
id, first_value(last_modified) over w as lm from log.' || relname || ' wind=
ow w as (partition by id order by id desc,trigger_id desc ) )x left join pu=
blic.' || relname || ' on id=3Dlog_id where ( id is null or last_modified <=
> x.lm ) and log_id is not null'
  ) from pg_stat_user_tables where schemaname =3D 'log';


(Looks at the log.* table, finds the largest last_modified of the
non-deleted rows, joins the public.* table on id, compares
last_modified).

--=20
Rados=C5=82aw Zieli=C5=84ski <radek@pld-linux.org>

Re: 8.4.0 data loss / HOT-related bug

From
Radoslaw Zielinski
Date:
Greg Stark <gsstark@mit.edu> [2009-08-21 18:38]:
[...]
> the toast table. Not unless you can dump the whole index and find
> pointers in there or can find the details in the wal log.

You mean the primary key index?  How do I dump it?

--=20
Rados=C5=82aw Zieli=C5=84ski <radek@pld-linux.org>

Re: 8.4.0 data loss / HOT-related bug

From
Greg Stark
Date:
2009/8/21 Radoslaw Zielinski <radek@pld-linux.org>:
> Greg Stark <gsstark@mit.edu> [2009-08-21 18:38]:
> [...]
>> the toast table. Not unless you can dump the whole index and find
>> pointers in there or can find the details in the wal log.
>
> You mean the primary key index? =A0How do I dump it?

Actually I mean the key for the toast table.

Let me ask firstly do you get anything if you just do select * from
pg_toast.pg_toast_29644 where chunk_id =3D 120741 ?

And secondly, what do you get if you do "select length(htmlblob) from
tab where id =3D ?" which should work even if the toast data is broken.

To dump the index -- this will be tedious, but:

Load the pageinspect contrib module and run this:
select (x).* from (select bt_page_items('t_pkey',1) as x ) as y;
select (x).* from (select bt_page_items('t_pkey',2) as x ) as y;
...

Until you get an error. Then look through the output for any pointers

The hex at the end is the index key which in the case of a toast table
starts with the chunk id, you're looking for 120741 which is 0001D7A5
in hex or A5D70100 in little-endian...

If you find any such rows send them and they might tell us what's
going on or where to look for the missing toast records to see what's
going on.

However I'm kind of skeptical that you'll find any. IIRC Btree marks
pointers that it finds point to dead records and reuses them as soon
as possible. Still, maybe nothing's been inserted on that page yet.


--=20
greg
http://mit.edu/~gsstark/resume.pdf

Re: 8.4.0 data loss / HOT-related bug

From
Tom Lane
Date:
Radoslaw Zielinski <radek@pld-linux.org> writes:
> Greg Stark <gsstark@mit.edu> [2009-08-21 18:55]:
>> Is this, perchance, new hardware? Did you test the memory in it?

> It's a "virtual private server"; the hosting provider is swearing
> everything's fine.  I can't vouch for it myself, obviously.

Hmmm ... have you had any crashes?  I'd wonder how trustworthy fsync is
on such a setup.

            regards, tom lane

Re: 8.4.0 data loss / HOT-related bug

From
Radoslaw Zielinski
Date:
Greg Stark <gsstark@mit.edu> [2009-08-21 20:30]:
[...]
> Actually I mean the key for the toast table.

> Let me ask firstly do you get anything if you just do select * from
> pg_toast.pg_toast_29644 where chunk_id =3D 120741 ?

0 rows.

> And secondly, what do you get if you do "select length(htmlblob) from
> tab where id =3D ?" which should work even if the toast data is broken.

  bug=3D# select length(opis_szczeg) from promocje where id =3D 4300;
  ERROR:  missing chunk number 0 for toast value 120741 in pg_toast_29644

> To dump the index -- this will be tedious, but:

  $ seq 11 | xargs -ti psql bug -c "select {}, (x).* from (select bt_page_i=
tems('promocje_pkey',{}) as x ) as y"
  [...]
  psql bug -c select 11, (x).* from (select bt_page_items('promocje_pkey',1=
1) as x ) as y=20
  ERROR:  block number out of range

Output: http://radek.cc/promocje_bpi.txt.gz

[...]
> If you find any such rows send them and they might tell us what's
> going on or where to look for the missing toast records to see what's
> going on.

No luck.

--=20
Rados=C5=82aw Zieli=C5=84ski <radek@pld-linux.org>

Re: 8.4.0 data loss / HOT-related bug

From
Radoslaw Zielinski
Date:
Tom Lane <tgl@sss.pgh.pa.us> [2009-08-21 20:35]:
> Radoslaw Zielinski <radek@pld-linux.org> writes:
>> Greg Stark <gsstark@mit.edu> [2009-08-21 18:55]:
>>> Is this, perchance, new hardware? Did you test the memory in it?

>> It's a "virtual private server"; the hosting provider is swearing
>> everything's fine.  I can't vouch for it myself, obviously.

> Hmmm ... have you had any crashes?  I'd wonder how trustworthy fsync is
> on such a setup.

Nope (at least nothing I have noticed).  BTW: it's OpenVZ.

--=20
Rados=C5=82aw Zieli=C5=84ski <radek@pld-linux.org>

Re: 8.4.0 data loss / HOT-related bug

From
Greg Stark
Date:
2009/8/22 Radoslaw Zielinski <radek@pld-linux.org>:
> =A0bug=3D# select length(opis_szczeg) from promocje where id =3D 4300;
> =A0ERROR: =A0missing chunk number 0 for toast value 120741 in pg_toast_29=
644

Sorry, what datatype is this again? And what encoding?

Perhaps I should have said octet_length() instead of length().


--=20
greg
http://mit.edu/~gsstark/resume.pdf

Re: 8.4.0 data loss / HOT-related bug

From
Greg Stark
Date:
2009/8/22 Radoslaw Zielinski <radek@pld-linux.org>:
> =A0$ seq 11 | xargs -ti psql bug -c "select {}, (x).* from (select bt_pag=
e_items('promocje_pkey',{}) as x ) as y"
> =A0[...]
> =A0psql bug -c select 11, (x).* from (select bt_page_items('promocje_pkey=
',11) as x ) as y
> =A0ERROR: =A0block number out of range

Sorry, I forgot to adjust the index name in the example. We don't want
the primary key index of your table we want the index of the toast
table.

Try the same thing with bt_page_items('pg_toast.pg_toast_29644_index',...)


--=20
greg
http://mit.edu/~gsstark/resume.pdf

Re: 8.4.0 data loss / HOT-related bug

From
Radoslaw Zielinski
Date:
Greg Stark <gsstark@mit.edu> [2009-08-23 03:40]:
> 2009/8/22 Radoslaw Zielinski <radek@pld-linux.org>:
>> =C2=A0bug=3D# select length(opis_szczeg) from promocje where id =3D 4300;
>> =C2=A0ERROR: =C2=A0missing chunk number 0 for toast value 120741 in pg_t=
oast_29644
> Sorry, what datatype is this again? And what encoding?

text, UTF-8 (locale C).

> Perhaps I should have said octet_length() instead of length().

  bug=3D# select octet_length(opis_szczeg) from promocje where id =3D 4300;
   octet_length=20
  --------------
           3081
  (1 row)

--=20
Rados=C5=82aw Zieli=C5=84ski <radek@pld-linux.org>

Re: 8.4.0 data loss / HOT-related bug

From
Radoslaw Zielinski
Date:
Greg Stark <gsstark@mit.edu> [2009-08-23 03:44]:
> 2009/8/22 Radoslaw Zielinski <radek@pld-linux.org>:
>> =C2=A0$ seq 11 | xargs -ti psql bug -c "select {}, (x).* from (select bt=
_page_items('promocje_pkey',{}) as x ) as y"
>> =C2=A0[...]
>> =C2=A0psql bug -c select 11, (x).* from (select bt_page_items('promocje_=
pkey',11) as x ) as y
>> =C2=A0ERROR: =C2=A0block number out of range

> Sorry, I forgot to adjust the index name in the example. We don't want
> the primary key index of your table we want the index of the toast
> table.

Sorry, I got confused by the index not being shown in \d pg_toast.pg_toast_=
29644.

> Try the same thing with bt_page_items('pg_toast.pg_toast_29644_index',...)

bug=3D# select * from bt_page_items('pg_toast.pg_toast_29644_index', 8) whe=
re data ~ 'a5 d7 01 00';
-[ RECORD 1 ]-----------------------
itemoffset | 121
ctid       | (458,3)
itemlen    | 16
nulls      | f
vars       | f
data       | a5 d7 01 00 00 00 00 00

Full output, without the WHERE: http://radek.cc/promocje_bpi2.txt.gz
get_raw_page( 'pg_toast.pg_toast_29644', 458 ): http://radek.cc/toast_29644=
_page_458.txt

bug=3D# select * from heap_page_items( get_raw_page( 'pg_toast.pg_toast_296=
44', 458 ) );
 lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid  | t=
_infomask2 | t_infomask | t_hoff | t_bits | t_oid=20
----+--------+----------+--------+--------+--------+----------+---------+--=
-----------+------------+--------+--------+-------
  1 |   6156 |        1 |   2036 |   3816 |      0 |        1 | (458,1) |  =
         3 |       2306 |     24 |        |=20=20=20=20=20=20
  2 |   4176 |        1 |   1979 |   3816 |      0 |        1 | (458,2) |  =
         3 |       2306 |     24 |        |=20=20=20=20=20=20
  3 |      0 |        3 |      0 |        |        |          |         |  =
           |            |        |        |=20=20=20=20=20=20
  4 |   2140 |        1 |   2036 |   3823 |      0 |        0 | (458,4) |  =
         3 |       2306 |     24 |        |=20=20=20=20=20=20
(4 rows)

bug=3D# select * from bt_page_stats('pg_toast.pg_toast_29644_index', 8);
-[ RECORD 1 ]-+-----
blkno         | 8
type          | l
live_items    | 366
dead_items    | 1
avg_item_size | 16
page_size     | 8192
free_size     | 808
btpo_prev     | 7
btpo_next     | 9
btpo          | 0
btpo_flags    | 65

--=20
Rados=C5=82aw Zieli=C5=84ski <radek@pld-linux.org>

Re: 8.4.0 data loss / HOT-related bug

From
Andrew Gierth
Date:
>>>>> "Greg" == Greg Stark <gsstark@mit.edu> writes:

 Greg> Either of two things are true.
 Greg> Either transaction 6179 committed,
[snip]

This is all missing the point. The row should have been killed by
transaction 4971, NOT 6179. By the time transaction 6179 tried to
do anything with it, it was almost certainly already broken (or
possibly 6179 broke it).

Notice that in the log table, the log entry that records the most
recent update to the row is the one with xmin=4971. There is no
entry in the log table corresponding to 6179.

--
Andrew (irc:RhodiumToad)

Re: 8.4.0 data loss / HOT-related bug

From
Tom Lane
Date:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Greg" == Greg Stark <gsstark@mit.edu> writes:
>  Greg> Either of two things are true.
>  Greg> Either transaction 6179 committed,
> [snip]

> This is all missing the point. The row should have been killed by
> transaction 4971, NOT 6179. By the time transaction 6179 tried to
> do anything with it, it was almost certainly already broken (or
> possibly 6179 broke it).

If there have been any crashes in this database, the problem might be
a variant of Jeff Ross' issue --- to wit, the row killed by
transaction 4971 actually is dead (and its associated toast items
have been removed), but that row is still found by seqscans because
of an incorrect PD_ALL_VISIBLE flag.

            regards, tom lane