Hi Tom,
We actually may find the cause of the problem. Last night we found two seve=
ral old rows in the jms_messages table which should be gone long time ago. =
Once we deleted them, everything is back to normal, no more ERROR message. =
We suspected these rows are corrupted.
Thanks a lot for your help.
Pius
________________________________________
From: Tom Lane [tgl@sss.pgh.pa.us]
Sent: Tuesday, January 22, 2013 8:14 PM
To: Pius Chan
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #7819: missing chunk number 0 for toast value 12359=
19 in pg_toast_35328
Pius Chan <pchan@contigo.com> writes:
> Hi Tom,
> Yes, we start seeing this ERROR after upgrade to 9.1.7. The ERROR is from=
the "cluster jms_messages" command. Last night, the ERROR happened three =
times:
> (1) at 00:20:01
> ERROR: missing chunk number 0 for toast value 1239124 in pg_toast_35328
> (2) at 00:25:01
> ERROR: missing chunk number 0 for toast value 1239124 in pg_toast_35328
> (3) at 00:35:01
> ERROR: missing chunk number 0 for toast value 1241022 in pg_toast_35328
> The "cluster jms_messages" runs every 5 minutes. However, so far, it seem=
s that the ERROR happens at about mid-night and 35328 is the toast area of =
the "jms_message" table:
So what else is this application doing around midnight? In particular,
it seems like something must've happened between 00:15 and 00:20 to
create the problem with OID 1239124, and then something else happened
between 00:25 and 00:30 to get rid of it. And then a similar sequence
happened between 00:30 and 00:40 involving OID 1241022. Most likely the
trigger events are application actions against jms_messages, but we have
no info about what. (It's also possible that autovacuums of
jms_messages are involved, so you might want to crank up your logging
enough so you can see those in the postmaster log.)
I had first thought that this might have something to do with the
toast-value-processing changes we made in CLUSTER a year or so ago,
but a look in the commit logs confirms that those patches were in 9.1.3.
So I have no idea what might've broken between 9.1.3 and 9.1.7. We
need data, or even better an actual test case.
regards, tom lane