Problem with TOAST column corruption - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Problem with TOAST column corruption |
Date | |
Msg-id | 200112270614.fBR6EoH18425@candle.pha.pa.us Whole thread Raw |
Responses |
Re: Problem with TOAST column corruption
|
List | pgsql-hackers |
I have a /data directory from someone running 7.1.3 and they are seeing data corruption on a table using TOAST columns. Specifically, they are seeing this:test=> SELECT woman FROM user_details WHERE uid = '00eezEoLyWJK';ERROR: Relation 1 does not exist test=> SELECT question_num FROM user_details WHERE uid = '00eezEoLyWJK';ERROR: MemoryContextAlloc: invalid request size2139062147 The other columns in the same row are fine, and the other rows in the table are fine too. This is with RH Linux 6.2. They have not turned off fsync. The backtrace for the first failure is:#0 elog (lev=-1, fmt=0x813a5de "Relation %u does not exist") at elog.c:119#1 0x806ef17in heap_open (relationId=1, lockmode=1) at heapam.c:589#2 0x80736a5 in toast_fetch_datum (attr=0x2b5bd124) at tuptoaster.c:972#3 0x80720bc in heap_tuple_untoast_attr (attr=0x2b5bd124) at tuptoaster.c:127#4 0x812f556 in pg_detoast_datum(datum=0x2b5bd124) at fmgr.c:1434#5 0x80678a1 in printtup (tuple=0x2b5bd020, typeinfo=0x8335fb0, self=0x83498f0)at printtup.c:206#6 0x80b6609 in ExecRetrieve (slot=0x833d608, destfunc=0x83498f0, estate=0x833d338)at execMain.c:1187#7 0x80b6523 in ExecutePlan (estate=0x833d338, plan=0x833d2b0, operation=CMD_SELECT,numberTuples=0, direction=ForwardScanDirection, destfunc=0x83498f0) at execMain.c:1107#8 0x80b5b33in ExecutorRun (queryDesc=0x8335f20, estate=0x833d338, feature=3, count=0) at execMain.c:233#9 0x80f6ce3 inProcessQuery (parsetree=0x8328e38, plan=0x833d2b0, dest=Remote) at pquery.c:295#10 0x80f58eb in pg_exec_query_string( query_string=0x8328a68 "select * from user_details where uid = '00eezEoLyWJK';", dest=Remote, parse_context=0x8302b48)at postgres.c:810 Obviously, it is a TOAST-related problem. The error is happening while the Datum is trying to be untoasted. If I look at the Datum for the first failure I see: (gdb) print *(struct varlena *)datum$1 = {vl_len = 134507004, vl_dat = "c"} (gdb) printf "%x\n", (struct varlena *)datum.vl_len.vl_len80469fc As you can see, the high bit 0x08 is set, meaning that data is external, and 0x04 is not set, meaning it is not compressed. However, when it attempts to find the TOAST value, it fails trying to open a relation with oid equal to 1; obviously a problem. Here is the frame of 'attr' which has the improper relid: (gdb) print *attr $8 = {va_header = -1072362015, va_content = {va_compressed = {va_rawsize = 1, va_data = ""}, va_external= {va_rawsize = 1, va_extsize = 0, va_valueid = 47, va_toastrelid = 1, va_toastidxid = 0, va_rowid = 0, ^^^^^^^^^^^^^^^^^ va_attno = 0}, va_data = "\001"}} This is the varattrib structure stored on disk for TOAST entries. Obviously something is wrong there. But how, and does this problem still exist in current sources? Was it already fixed? The MemoryContextAlloc error backtrace is:#0 elog (lev=-1, fmt=0x81bf278 "MemoryContextAlloc: invalid request size %lu") at elog.c:120#1 0x81682ae in MemoryContextAlloc (context=0x8294800, size=2139062147) at mcxt.c:418#2 0x8078ca5in heap_tuple_untoast_attr (attr=0x831f54c) at tuptoaster.c:151#3 0x8162888 in pg_detoast_datum (datum=0x831f54c)at fmgr.c:1434#4 0x8066dac in printtup (tuple=0x831f514, typeinfo=0x831f144, self=0x831f4cc) at printtup.c:206#5 0x80d1088 in ExecRetrieve (slot=0x831ee80, destfunc=0x831f4cc, estate=0x831e7c0) at execMain.c:1187#6 0x80d0fd7 in ExecutePlan (estate=0x831e7c0, plan=0x831e734, operation=CMD_SELECT, numberTuples=0, direction=ForwardScanDirection, destfunc=0x831f4cc) at execMain.c:1107#7 0x80d02d6 in ExecutorRun (queryDesc=0x831ee1c,estate=0x831e7c0, feature=3, count=0) at execMain.c:233#8 0x81221cb in ProcessQuery (parsetree=0x831a45c,plan=0x831e734, dest=Remote) at pquery.c:295#9 0x8120aa1 in pg_exec_query_string ( query_string=0x831a038"select question_num from user_details where uid = '00eezEoLyWJK';", dest=Remote, parse_context=0x82946e8)at postgres.c:810#10 0x8121c38 in PostgresMain (argc=5, argv=0x8046f1c, real_argc=6, real_argv=0x804786c,username=0x8259661 "postgres") at postgres.c:1908#11 0x81078b0 in DoBackend (port=0x8259400) at postmaster.c:2114#120x81073f6 in BackendStartup (port=0x8259400) at postmaster.c:1897#13 0x8106579 in ServerLoop () at postmaster.c:995#140x8105f59 in PostmasterMain (argc=6, argv=0x804786c) at postmaster.c:685#15 0x80e2ed2 in main (argc=6,argv=0x804786c) at main.c:171#16 0x8064c4e in __start () Frame 3 reports datum as: (gdb) print *(struct varlena *)datum$5 = {vl_len = 2139062142, vl_dat = "\177"}(gdb) printf "%x\n", (struct varlena *)datum.vl_len7f7f7f7e That 0x7f7f7f7e looks quite strange. Frame 2 reports attr, which is the same as datum: (gdb) print *attr $25 = {va_header = 2139062142, va_content = {va_compressed = { va_rawsize = 2139062143, va_data ="�"}, va_external = { va_rawsize = 2139062143, va_extsize = 137491692, va_valueid = 16, va_toastrelid = 0, va_toastidxid= 2139062143, va_rowid = 2139062143, va_attno = 32639}, va_data = "\177"}} Obviously the length is huge and PostgreSQL fails on the memory allocation.' Here is someone reporting the same problem in August: http://fts.postgresql.org/db/mw/msg.html?mid=1029724 VACUUM ANALYZE has the same failure as "SELECT" because it accesses every value in the table. Unfortunately, I don't see an answer supplied to this bug report. This is Tom Lane's reply: http://fts.postgresql.org/db/mw/msg.html?mid=1029778 The general problem appears to be that certain toast rows have improper lengths or settings. The problem appears only with the 'woman' and 'question_num' columns in that row. The other rows are fine. Seems this is may be an unknown problem. I did a search for TOAST and the MemoryContextAlloc message: http://fts.postgresql.org/db/mw/index.html?section=-1&word=toast+MemoryContextAlloc&action=Search&sdb_d=25&sdb_m=1&sdb_y=2001&sde_d=26&sde_m=12&sde_y=2001&weight=0&format=0&order=1 Comments? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
pgsql-hackers by date: