Thread: PL/pgSQL question about EXCEPTION clause & corrupt records
Hello, We recently suffered a database crash which resulted in some corrupt records. I thought I would write a little PL script that would loop through all the data and report any inconsistencies. However, I can't get it to work properly. For instance, if I run the following statement in pgAdmin: SELECT * FROM f_gsxws_transaction WHERE gwta_number = 762513 I get the following message: ERROR: missing chunk number 0 for toast value 8289525 in pg_toast_5572299 So, as a test, I created a function that would just retrieve that one record: DECLARE rcontent f_gsxws_transaction%ROWTYPE; BEGIN SELECT * INTO rcontent FROM f_gsxws_transaction where gwta_number = 762513; RETURN rcontent; EXCEPTION WHEN OTHERS THEN RAISE NOTICE 'Record 762513 is corrupt'; END; Now, when I run this function, I have noticed two things: 1) The function has no problem executing the SELECT statement. It is only when "rcontents" is returned, that the function fails. This is a problem, because the ultimate goal is to loop through all records and only return/alert something in case of an error. 2) The function never enters the EXCEPTION clause. Instead, when it hits the RETURN command, it breaks and shows the same message as in pgAdmin: missing chunk number 0 for toast value 8289525 in pg_toast_5572299. Is it possible to check for these kind of errors with a PL script? Or is there perhaps a better way to check for corrupt records in a database? Best regards, Nick Renders
On 14/2/20 2:39 μ.μ., Nick Renders wrote: > > Hello, > > We recently suffered a database crash which resulted in some corrupt records. > > I thought I would write a little PL script that would loop through all the data and report any inconsistencies. However,I can't get it to work properly. > > For instance, if I run the following statement in pgAdmin: > > SELECT * FROM f_gsxws_transaction WHERE gwta_number = 762513 > > I get the following message: > > ERROR: missing chunk number 0 for toast value 8289525 in pg_toast_5572299 > > > So, as a test, I created a function that would just retrieve that one record: > > DECLARE > rcontent f_gsxws_transaction%ROWTYPE; > BEGIN > SELECT * INTO rcontent FROM f_gsxws_transaction where gwta_number = 762513; > RETURN rcontent; > EXCEPTION WHEN OTHERS THEN > RAISE NOTICE 'Record 762513 is corrupt'; > END; > > > Now, when I run this function, I have noticed two things: > > 1) The function has no problem executing the SELECT statement. It is only when "rcontents" is returned, that the functionfails. This is a problem, because the ultimate goal is to loop through all > records and only return/alert something in case of an error. > > 2) The function never enters the EXCEPTION clause. Instead, when it hits the RETURN command, it breaks and shows the samemessage as in pgAdmin: missing chunk number 0 for toast value 8289525 in > pg_toast_5572299. Does the table have any PKs or UKs? do something like FOR vid IN SELECT <somepkid> FROM f_gsxws_transaction where gwta_number = 762513 ORDER BY <somepkid> LOOP RAISE NOTICE 'examining row with <somepkid>= %',vid; select * into rcontent FROM f_gsxws_transaction where <somepkid> = vid; RAISE NOTICE 'content of row <somepkid>= % , is % ',vid,rcontent; END LOOP; > > > Is it possible to check for these kind of errors with a PL script? Or is there perhaps a better way to check for corruptrecords in a database? > > Best regards, > > Nick Renders > > -- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
"Nick Renders" <postgres@arcict.com> writes: > I thought I would write a little PL script that would loop through all > the data and report any inconsistencies. However, I can't get it to work > properly. > ... > 1) The function has no problem executing the SELECT statement. It is > only when "rcontents" is returned, that the function fails. This is a > problem, because the ultimate goal is to loop through all records and > only return/alert something in case of an error. > 2) The function never enters the EXCEPTION clause. Instead, when it hits > the RETURN command, it breaks and shows the same message as in pgAdmin: > missing chunk number 0 for toast value 8289525 in pg_toast_5572299. I think what's happening there is that the function doesn't try to dereference the value's TOAST pointer during SELECT INTO. It just stores that pointer into a variable, and only sometime later when the actual content of the value is demanded, do you see the error raised. The solution to that is to do something that uses the contents of the busted column right away while still inside the EXCEPTION block, perhaps along the lines of "select md5(mycolumn) into local_variable from..." A close reading of https://www.postgresql.org/docs/current/storage-toast.html would probably help you understand what's happening here. regards, tom lane
> On Feb 14, 2020, at 04:39, Nick Renders <postgres@arcict.com> wrote: > > I get the following message: > > ERROR: missing chunk number 0 for toast value 8289525 in pg_toast_5572299 What version of PostgreSQL are you running? I’ve seen this a number of times the past couple years; curious if the lurkingbug is still observed in latest versions. -Jeremy Sent from my TI-83
The problem is that I don't know which column is corrupt. But I found a solution: by simply copying the record into another variable, the values are parsed and the TOAST errors are thrown. In case anyone's interested, here's my code, based on an example from http://www.databasesoup.com/2013/10/de-corrupting-toast-tables.html DO $f$ DECLARE rContent1 record; rContent2 record; iCounter integer DEFAULT 1; iValue integer; pTableName varchar := 'f_gsxws_transaction'; pFieldName varchar := 'gwta_number'; BEGIN FOR iValue IN EXECUTE 'SELECT ' || pFieldName || ' FROM ' || pTableName::regclass || ' ORDER BY ' || pFieldName LOOP BEGIN EXECUTE 'SELECT * FROM ' || pTableName::regclass || ' WHERE ' || pFieldName || ' = $1' INTO rContent1 USING iValue; rContent2 := rContent1; EXCEPTION WHEN OTHERS THEN RAISE NOTICE 'data for %.% % is corrupt', pTableName, pFieldName, iValue; END; IF iCounter % 100000 = 0 THEN RAISE NOTICE '% % records checked', iCounter, pTableName; END IF; iCounter := iCounter+1; END LOOP; END; $f$; Cheers, Nick On 14 Feb 2020, at 16:14, Tom Lane wrote: > "Nick Renders" <postgres@arcict.com> writes: >> I thought I would write a little PL script that would loop through >> all >> the data and report any inconsistencies. However, I can't get it to >> work >> properly. >> ... >> 1) The function has no problem executing the SELECT statement. It is >> only when "rcontents" is returned, that the function fails. This is a >> problem, because the ultimate goal is to loop through all records and >> only return/alert something in case of an error. >> 2) The function never enters the EXCEPTION clause. Instead, when it >> hits >> the RETURN command, it breaks and shows the same message as in >> pgAdmin: >> missing chunk number 0 for toast value 8289525 in pg_toast_5572299. > > I think what's happening there is that the function doesn't try to > dereference the value's TOAST pointer during SELECT INTO. It just > stores > that pointer into a variable, and only sometime later when the actual > content of the value is demanded, do you see the error raised. > > The solution to that is to do something that uses the contents of the > busted column right away while still inside the EXCEPTION block, > perhaps > along the lines of "select md5(mycolumn) into local_variable from..." > > A close reading of > > https://www.postgresql.org/docs/current/storage-toast.html > > would probably help you understand what's happening here. > > regards, tom lane
Hi Jeremy, This happend on PostgreSQL v9.6 which crashed 2 weeks ago. Since then we have upgraded and restored our server, but my example is from the older, corrupt database. Nick On 15 Feb 2020, at 5:30, Jeremy Schneider wrote: >> On Feb 14, 2020, at 04:39, Nick Renders <postgres@arcict.com> wrote: >> >> I get the following message: >> >> ERROR: missing chunk number 0 for toast value 8289525 in >> pg_toast_5572299 > > What version of PostgreSQL are you running? I’ve seen this a number > of times the past couple years; curious if the lurking bug is still > observed in latest versions. > > -Jeremy > > Sent from my TI-83
FWIW, Bertrand blogged an even faster way to do this about a month ago - using pageinspect and processing blocks instead of rows
https://bdrouvot.wordpress.com/2020/01/18/retrieve-postgresql-variable-length-storage-information-thanks-to-pageinspect/
-J
-J
Sent from my TI-83
On Feb 17, 2020, at 03:32, Nick Renders <postgres@arcict.com> wrote:
The problem is that I don't know which column is corrupt. But I found a solution: by simply copying the record into another variable, the values are parsed and the TOAST errors are thrown.
In case anyone's interested, here's my code, based on an example from http://www.databasesoup.com/2013/10/de-corrupting-toast-tables.html
DO $f$
DECLARE
rContent1 record;
rContent2 record;
iCounter integer DEFAULT 1;
iValue integer;
pTableName varchar := 'f_gsxws_transaction';
pFieldName varchar := 'gwta_number';
BEGIN
FOR iValue IN EXECUTE 'SELECT ' || pFieldName || ' FROM ' || pTableName::regclass || ' ORDER BY ' || pFieldName LOOP
BEGIN
EXECUTE 'SELECT * FROM ' || pTableName::regclass || ' WHERE ' || pFieldName || ' = $1'
INTO rContent1
USING iValue;
rContent2 := rContent1;
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'data for %.% % is corrupt', pTableName, pFieldName, iValue;
END;
IF iCounter % 100000 = 0 THEN
RAISE NOTICE '% % records checked', iCounter, pTableName;
END IF;
iCounter := iCounter+1;
END LOOP;
END;
$f$;
Cheers,
Nick
On 14 Feb 2020, at 16:14, Tom Lane wrote:"Nick Renders" <postgres@arcict.com> writes:I thought I would write a little PL script that would loop through allthe data and report any inconsistencies. However, I can't get it to workproperly....1) The function has no problem executing the SELECT statement. It isonly when "rcontents" is returned, that the function fails. This is aproblem, because the ultimate goal is to loop through all records andonly return/alert something in case of an error.2) The function never enters the EXCEPTION clause. Instead, when it hitsthe RETURN command, it breaks and shows the same message as in pgAdmin:missing chunk number 0 for toast value 8289525 in pg_toast_5572299.I think what's happening there is that the function doesn't try todereference the value's TOAST pointer during SELECT INTO. It just storesthat pointer into a variable, and only sometime later when the actualcontent of the value is demanded, do you see the error raised.The solution to that is to do something that uses the contents of thebusted column right away while still inside the EXCEPTION block, perhapsalong the lines of "select md5(mycolumn) into local_variable from..."A close reading ofhttps://www.postgresql.org/docs/current/storage-toast.htmlwould probably help you understand what's happening here.regards, tom lane
On 2/17/20 08:23, Jeremy Schneider wrote: > FWIW, Bertrand blogged an even faster way to do this about a month ago - > using pageinspect and processing blocks instead of rows > > https://bdrouvot.wordpress.com/2020/01/18/retrieve-postgresql-variable-length-storage-information-thanks-to-pageinspect/ > > >> On Feb 17, 2020, at 03:32, Nick Renders <postgres@arcict.com> wrote: >> >> The problem is that I don't know which column is corrupt. But I found >> a solution: by simply copying the record into another variable, the >> values are parsed and the TOAST errors are thrown. >> >> In case anyone's interested, here's my code, based on an example from >> http://www.databasesoup.com/2013/10/de-corrupting-toast-tables.html Apologies - sent that slightly rushed email while I was taking the bus to the office this morning and linked the wrong blog post :D Here's the right one, showing a much faster way to identify which tuple links to a bad toast row: https://bdrouvot.wordpress.com/2020/01/04/get-toast-chunk_id-from-the-user-table-tuples-or-from-the-toast-index-thanks-to-pageinspect/ -Jeremy -- http://about.me/jeremy_schneider