Problem with TOAST column corruption - Mailing list pgsql-hackers

From Bruce Momjian
Subject Problem with TOAST column corruption
Date
Msg-id 200112270614.fBR6EoH18425@candle.pha.pa.us
Whole thread Raw
Responses Re: Problem with TOAST column corruption
List pgsql-hackers
I have a /data directory from someone running 7.1.3 and they are seeing
data corruption on a table using TOAST columns.  Specifically, they are
seeing this:test=> SELECT woman FROM user_details WHERE uid = '00eezEoLyWJK';ERROR:  Relation 1 does not exist
test=> SELECT question_num FROM user_details WHERE uid = '00eezEoLyWJK';ERROR:  MemoryContextAlloc: invalid request
size2139062147
 

The other columns in the same row are fine, and the other rows in the
table are fine too.  This is with RH Linux 6.2.  They have not turned
off fsync.

The backtrace for the first failure is:#0  elog (lev=-1, fmt=0x813a5de "Relation %u does not exist") at elog.c:119#1
0x806ef17in heap_open (relationId=1, lockmode=1) at heapam.c:589#2  0x80736a5 in toast_fetch_datum (attr=0x2b5bd124) at
tuptoaster.c:972#3 0x80720bc in heap_tuple_untoast_attr (attr=0x2b5bd124) at tuptoaster.c:127#4  0x812f556 in
pg_detoast_datum(datum=0x2b5bd124) at fmgr.c:1434#5  0x80678a1 in printtup (tuple=0x2b5bd020, typeinfo=0x8335fb0,
self=0x83498f0)at printtup.c:206#6  0x80b6609 in ExecRetrieve (slot=0x833d608, destfunc=0x83498f0,
estate=0x833d338)at execMain.c:1187#7  0x80b6523 in ExecutePlan (estate=0x833d338, plan=0x833d2b0,
operation=CMD_SELECT,numberTuples=0, direction=ForwardScanDirection,     destfunc=0x83498f0) at execMain.c:1107#8
0x80b5b33in ExecutorRun (queryDesc=0x8335f20, estate=0x833d338,     feature=3, count=0) at execMain.c:233#9  0x80f6ce3
inProcessQuery (parsetree=0x8328e38, plan=0x833d2b0,     dest=Remote) at pquery.c:295#10 0x80f58eb in
pg_exec_query_string(    query_string=0x8328a68 "select * from user_details where uid = '00eezEoLyWJK';", dest=Remote,
parse_context=0x8302b48)at postgres.c:810
 

Obviously, it is a TOAST-related problem.  The error is happening while
the Datum is trying to be untoasted.  If I look at the Datum for the
first failure I see:
(gdb) print *(struct varlena *)datum$1 = {vl_len = 134507004, vl_dat = "c"}
(gdb) printf "%x\n",  (struct varlena *)datum.vl_len.vl_len80469fc

As you can see, the high bit 0x08 is set, meaning that data is external,
and 0x04 is not set, meaning it is not compressed.  However, when it
attempts to find the TOAST value, it fails trying to open a relation
with oid equal to 1;  obviously a problem.

Here is the frame of 'attr' which has the improper relid:
 (gdb) print *attr $8 = {va_header = -1072362015, va_content = {va_compressed = {va_rawsize = 1,      va_data = ""},
va_external= {va_rawsize = 1, va_extsize = 0,      va_valueid = 47, va_toastrelid = 1, va_toastidxid = 0, va_rowid = 0,

                      ^^^^^^^^^^^^^^^^^     va_attno = 0}, va_data = "\001"}}

This is the varattrib structure stored on disk for TOAST entries. 
Obviously something is wrong there.  But how, and does this problem
still exist in current sources?  Was it already fixed?

The MemoryContextAlloc error backtrace is:#0  elog (lev=-1, fmt=0x81bf278 "MemoryContextAlloc: invalid request size
%lu")   at elog.c:120#1  0x81682ae in MemoryContextAlloc (context=0x8294800, size=2139062147)    at mcxt.c:418#2
0x8078ca5in heap_tuple_untoast_attr (attr=0x831f54c) at tuptoaster.c:151#3  0x8162888 in pg_detoast_datum
(datum=0x831f54c)at fmgr.c:1434#4  0x8066dac in printtup (tuple=0x831f514, typeinfo=0x831f144, self=0x831f4cc)    at
printtup.c:206#5 0x80d1088 in ExecRetrieve (slot=0x831ee80, destfunc=0x831f4cc,     estate=0x831e7c0) at
execMain.c:1187#6 0x80d0fd7 in ExecutePlan (estate=0x831e7c0, plan=0x831e734,     operation=CMD_SELECT, numberTuples=0,
direction=ForwardScanDirection,    destfunc=0x831f4cc) at execMain.c:1107#7  0x80d02d6 in ExecutorRun
(queryDesc=0x831ee1c,estate=0x831e7c0,     feature=3, count=0) at execMain.c:233#8  0x81221cb in ProcessQuery
(parsetree=0x831a45c,plan=0x831e734,     dest=Remote) at pquery.c:295#9  0x8120aa1 in pg_exec_query_string (
query_string=0x831a038"select question_num from user_details where uid = '00eezEoLyWJK';", dest=Remote,
parse_context=0x82946e8)at postgres.c:810#10 0x8121c38 in PostgresMain (argc=5, argv=0x8046f1c, real_argc=6,
real_argv=0x804786c,username=0x8259661 "postgres") at postgres.c:1908#11 0x81078b0 in DoBackend (port=0x8259400) at
postmaster.c:2114#120x81073f6 in BackendStartup (port=0x8259400) at postmaster.c:1897#13 0x8106579 in ServerLoop () at
postmaster.c:995#140x8105f59 in PostmasterMain (argc=6, argv=0x804786c) at postmaster.c:685#15 0x80e2ed2 in main
(argc=6,argv=0x804786c) at main.c:171#16 0x8064c4e in __start ()
 

Frame 3 reports datum as:
(gdb) print *(struct varlena *)datum$5 = {vl_len = 2139062142, vl_dat = "\177"}(gdb) printf "%x\n",  (struct varlena
*)datum.vl_len7f7f7f7e

That 0x7f7f7f7e looks quite strange.

Frame 2 reports attr, which is the same as datum:
 (gdb) print *attr $25 = {va_header = 2139062142, va_content = {va_compressed = {     va_rawsize = 2139062143, va_data
="�"}, va_external = {     va_rawsize = 2139062143, va_extsize = 137491692, va_valueid = 16,      va_toastrelid = 0,
va_toastidxid= 2139062143, va_rowid = 2139062143,      va_attno = 32639}, va_data = "\177"}}
 

Obviously the length is huge and PostgreSQL fails on the memory
allocation.'

Here is someone reporting the same problem in August:
http://fts.postgresql.org/db/mw/msg.html?mid=1029724

VACUUM ANALYZE has the same failure as "SELECT" because it accesses
every value in the table.  Unfortunately, I don't see an answer supplied
to this bug report.  This is Tom Lane's reply:
http://fts.postgresql.org/db/mw/msg.html?mid=1029778

The general problem appears to be that certain toast rows have improper
lengths or settings.  The problem appears only with the 'woman' and
'question_num' columns in that row.  The other rows are fine.

Seems this is may be an unknown problem.  I did a search for TOAST and
the MemoryContextAlloc message:

http://fts.postgresql.org/db/mw/index.html?section=-1&word=toast+MemoryContextAlloc&action=Search&sdb_d=25&sdb_m=1&sdb_y=2001&sde_d=26&sde_m=12&sde_y=2001&weight=0&format=0&order=1

Comments?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: History question
Next
From: "Pavlo Baron"
Date:
Subject: patch: INSERT INTO t VALUES (a, b, ..., DEFAULT, ...)