Thread: PL/pgSQL question about EXCEPTION clause & corrupt records

PL/pgSQL question about EXCEPTION clause & corrupt records

From
"Nick Renders"
Date:
Hello,

We recently suffered a database crash which resulted in some corrupt 
records.

I thought I would write a little PL script that would loop through all 
the data and report any inconsistencies. However, I can't get it to work 
properly.

For instance, if I run the following statement in pgAdmin:

    SELECT * FROM f_gsxws_transaction WHERE gwta_number = 762513

I get the following message:

    ERROR:  missing chunk number 0 for toast value 8289525 in 
pg_toast_5572299


So, as a test, I created a function that would just retrieve that one 
record:

    DECLARE
        rcontent f_gsxws_transaction%ROWTYPE;
    BEGIN
        SELECT * INTO rcontent FROM f_gsxws_transaction where gwta_number = 
762513;
        RETURN rcontent;
    EXCEPTION WHEN OTHERS THEN
        RAISE NOTICE 'Record 762513 is corrupt';
    END;


Now, when I run this function, I have noticed two things:

1) The function has no problem executing the SELECT statement. It is 
only when "rcontents" is returned, that the function fails. This is a 
problem, because the ultimate goal is to loop through all records and 
only return/alert something in case of an error.

2) The function never enters the EXCEPTION clause. Instead, when it hits 
the RETURN command, it breaks and shows the same message as in pgAdmin: 
missing chunk number 0 for toast value 8289525 in pg_toast_5572299.


Is it possible to check for these kind of errors with a PL script? Or is 
there perhaps a better way to check for corrupt records in a database?

Best regards,

Nick Renders



Re: PL/pgSQL question about EXCEPTION clause & corrupt records

From
Achilleas Mantzios
Date:
On 14/2/20 2:39 μ.μ., Nick Renders wrote:
>
> Hello,
>
> We recently suffered a database crash which resulted in some corrupt records.
>
> I thought I would write a little PL script that would loop through all the data and report any inconsistencies.
However,I can't get it to work properly.
 
>
> For instance, if I run the following statement in pgAdmin:
>
>     SELECT * FROM f_gsxws_transaction WHERE gwta_number = 762513
>
> I get the following message:
>
>     ERROR:  missing chunk number 0 for toast value 8289525 in pg_toast_5572299
>
>
> So, as a test, I created a function that would just retrieve that one record:
>
>     DECLARE
>         rcontent f_gsxws_transaction%ROWTYPE;
>     BEGIN
>         SELECT * INTO rcontent FROM f_gsxws_transaction where gwta_number = 762513;
>         RETURN rcontent;
>     EXCEPTION WHEN OTHERS THEN
>         RAISE NOTICE 'Record 762513 is corrupt';
>     END;
>
>
> Now, when I run this function, I have noticed two things:
>
> 1) The function has no problem executing the SELECT statement. It is only when "rcontents" is returned, that the
functionfails. This is a problem, because the ultimate goal is to loop through all 
 
> records and only return/alert something in case of an error.
>
> 2) The function never enters the EXCEPTION clause. Instead, when it hits the RETURN command, it breaks and shows the
samemessage as in pgAdmin: missing chunk number 0 for toast value 8289525 in 
 
> pg_toast_5572299.
Does the table have any PKs or UKs?
do something like

FOR vid IN SELECT <somepkid> FROM f_gsxws_transaction where gwta_number = 762513 ORDER BY <somepkid> LOOP
     RAISE NOTICE 'examining row with <somepkid>= %',vid;
     select * into rcontent FROM f_gsxws_transaction where <somepkid> = vid;
     RAISE NOTICE 'content of row <somepkid>= % , is % ',vid,rcontent;
END LOOP;

>
>
> Is it possible to check for these kind of errors with a PL script? Or is there perhaps a better way to check for
corruptrecords in a database?
 
>
> Best regards,
>
> Nick Renders
>
>


-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt




Re: PL/pgSQL question about EXCEPTION clause & corrupt records

From
Tom Lane
Date:
"Nick Renders" <postgres@arcict.com> writes:
> I thought I would write a little PL script that would loop through all 
> the data and report any inconsistencies. However, I can't get it to work 
> properly.
> ...
> 1) The function has no problem executing the SELECT statement. It is 
> only when "rcontents" is returned, that the function fails. This is a 
> problem, because the ultimate goal is to loop through all records and 
> only return/alert something in case of an error.
> 2) The function never enters the EXCEPTION clause. Instead, when it hits 
> the RETURN command, it breaks and shows the same message as in pgAdmin: 
> missing chunk number 0 for toast value 8289525 in pg_toast_5572299.

I think what's happening there is that the function doesn't try to
dereference the value's TOAST pointer during SELECT INTO.  It just stores
that pointer into a variable, and only sometime later when the actual
content of the value is demanded, do you see the error raised.

The solution to that is to do something that uses the contents of the
busted column right away while still inside the EXCEPTION block, perhaps
along the lines of "select md5(mycolumn) into local_variable from..."

A close reading of

https://www.postgresql.org/docs/current/storage-toast.html

would probably help you understand what's happening here.

            regards, tom lane



Re: PL/pgSQL question about EXCEPTION clause & corrupt records

From
Jeremy Schneider
Date:
> On Feb 14, 2020, at 04:39, Nick Renders <postgres@arcict.com> wrote:
>
> I get the following message:
>
>   ERROR:  missing chunk number 0 for toast value 8289525 in pg_toast_5572299

What version of PostgreSQL are you running? I’ve seen this a number of times the past couple years; curious if the
lurkingbug is still observed in latest versions. 

-Jeremy

Sent from my TI-83



Re: PL/pgSQL question about EXCEPTION clause & corrupt records

From
"Nick Renders"
Date:
The problem is that I don't know which column is corrupt. But I found a 
solution: by simply copying the record into another variable, the values 
are parsed and the TOAST errors are thrown.

In case anyone's interested, here's my code, based on an example from 
http://www.databasesoup.com/2013/10/de-corrupting-toast-tables.html


DO $f$
DECLARE
    rContent1 record;
    rContent2 record;
    iCounter integer DEFAULT 1;
    iValue integer;
    pTableName varchar := 'f_gsxws_transaction';
    pFieldName varchar := 'gwta_number';
BEGIN
    FOR iValue IN EXECUTE 'SELECT ' || pFieldName || ' FROM ' || 
pTableName::regclass || ' ORDER BY ' || pFieldName LOOP
        BEGIN
            EXECUTE 'SELECT * FROM ' || pTableName::regclass || ' WHERE ' || 
pFieldName || ' = $1'
                INTO rContent1
                USING iValue;
            rContent2 := rContent1;
        EXCEPTION WHEN OTHERS THEN
            RAISE NOTICE 'data for %.% % is corrupt', pTableName, pFieldName, 
iValue;
        END;
        IF iCounter % 100000 = 0 THEN
            RAISE NOTICE '% % records checked', iCounter, pTableName;
        END IF;
        iCounter := iCounter+1;
    END LOOP;
END;
$f$;


Cheers,

Nick


On 14 Feb 2020, at 16:14, Tom Lane wrote:

> "Nick Renders" <postgres@arcict.com> writes:
>> I thought I would write a little PL script that would loop through 
>> all
>> the data and report any inconsistencies. However, I can't get it to 
>> work
>> properly.
>> ...
>> 1) The function has no problem executing the SELECT statement. It is
>> only when "rcontents" is returned, that the function fails. This is a
>> problem, because the ultimate goal is to loop through all records and
>> only return/alert something in case of an error.
>> 2) The function never enters the EXCEPTION clause. Instead, when it 
>> hits
>> the RETURN command, it breaks and shows the same message as in 
>> pgAdmin:
>> missing chunk number 0 for toast value 8289525 in pg_toast_5572299.
>
> I think what's happening there is that the function doesn't try to
> dereference the value's TOAST pointer during SELECT INTO.  It just 
> stores
> that pointer into a variable, and only sometime later when the actual
> content of the value is demanded, do you see the error raised.
>
> The solution to that is to do something that uses the contents of the
> busted column right away while still inside the EXCEPTION block, 
> perhaps
> along the lines of "select md5(mycolumn) into local_variable from..."
>
> A close reading of
>
> https://www.postgresql.org/docs/current/storage-toast.html
>
> would probably help you understand what's happening here.
>
>             regards, tom lane



Re: PL/pgSQL question about EXCEPTION clause & corrupt records

From
"Nick Renders"
Date:
Hi Jeremy,

This happend on PostgreSQL v9.6 which crashed 2 weeks ago.
Since then we have upgraded and restored our server, but my example is 
from the older, corrupt database.

Nick


On 15 Feb 2020, at 5:30, Jeremy Schneider wrote:

>> On Feb 14, 2020, at 04:39, Nick Renders <postgres@arcict.com> wrote:
>>
>> I get the following message:
>>
>>   ERROR:  missing chunk number 0 for toast value 8289525 in 
>> pg_toast_5572299
>
> What version of PostgreSQL are you running? I’ve seen this a number 
> of times the past couple years; curious if the lurking bug is still 
> observed in latest versions.
>
> -Jeremy
>
> Sent from my TI-83



Re: PL/pgSQL question about EXCEPTION clause & corrupt records

From
Jeremy Schneider
Date:
FWIW, Bertrand blogged an even faster way to do this about a month ago - using pageinspect and processing blocks instead of rows

https://bdrouvot.wordpress.com/2020/01/18/retrieve-postgresql-variable-length-storage-information-thanks-to-pageinspect/

-J

Sent from my TI-83

On Feb 17, 2020, at 03:32, Nick Renders <postgres@arcict.com> wrote:

The problem is that I don't know which column is corrupt. But I found a solution: by simply copying the record into another variable, the values are parsed and the TOAST errors are thrown.

In case anyone's interested, here's my code, based on an example from http://www.databasesoup.com/2013/10/de-corrupting-toast-tables.html


DO $f$
DECLARE
   rContent1 record;
   rContent2 record;
   iCounter integer DEFAULT 1;
   iValue integer;
   pTableName varchar := 'f_gsxws_transaction';
   pFieldName varchar := 'gwta_number';
BEGIN
   FOR iValue IN EXECUTE 'SELECT ' || pFieldName || ' FROM ' || pTableName::regclass || ' ORDER BY ' || pFieldName LOOP
       BEGIN
           EXECUTE 'SELECT * FROM ' || pTableName::regclass || ' WHERE ' || pFieldName || ' = $1'
               INTO rContent1
               USING iValue;
           rContent2 := rContent1;
       EXCEPTION WHEN OTHERS THEN
           RAISE NOTICE 'data for %.% % is corrupt', pTableName, pFieldName, iValue;
       END;
       IF iCounter % 100000 = 0 THEN
           RAISE NOTICE '% % records checked', iCounter, pTableName;
       END IF;
       iCounter := iCounter+1;
   END LOOP;
END;
$f$;


Cheers,

Nick


On 14 Feb 2020, at 16:14, Tom Lane wrote:

"Nick Renders" <postgres@arcict.com> writes:
I thought I would write a little PL script that would loop through all
the data and report any inconsistencies. However, I can't get it to work
properly.
...
1) The function has no problem executing the SELECT statement. It is
only when "rcontents" is returned, that the function fails. This is a
problem, because the ultimate goal is to loop through all records and
only return/alert something in case of an error.
2) The function never enters the EXCEPTION clause. Instead, when it hits
the RETURN command, it breaks and shows the same message as in pgAdmin:
missing chunk number 0 for toast value 8289525 in pg_toast_5572299.

I think what's happening there is that the function doesn't try to
dereference the value's TOAST pointer during SELECT INTO.  It just stores
that pointer into a variable, and only sometime later when the actual
content of the value is demanded, do you see the error raised.

The solution to that is to do something that uses the contents of the
busted column right away while still inside the EXCEPTION block, perhaps
along the lines of "select md5(mycolumn) into local_variable from..."

A close reading of

https://www.postgresql.org/docs/current/storage-toast.html

would probably help you understand what's happening here.

           regards, tom lane


Re: PL/pgSQL question about EXCEPTION clause & corrupt records

From
Jeremy Schneider
Date:
On 2/17/20 08:23, Jeremy Schneider wrote:
> FWIW, Bertrand blogged an even faster way to do this about a month ago -
> using pageinspect and processing blocks instead of rows
> 
>
https://bdrouvot.wordpress.com/2020/01/18/retrieve-postgresql-variable-length-storage-information-thanks-to-pageinspect/
> 
> 
>> On Feb 17, 2020, at 03:32, Nick Renders <postgres@arcict.com> wrote:
>>
>> The problem is that I don't know which column is corrupt. But I found
>> a solution: by simply copying the record into another variable, the
>> values are parsed and the TOAST errors are thrown.
>>
>> In case anyone's interested, here's my code, based on an example from
>> http://www.databasesoup.com/2013/10/de-corrupting-toast-tables.html

Apologies - sent that slightly rushed email while I was taking the bus
to the office this morning and linked the wrong blog post :D

Here's the right one, showing a much faster way to identify which tuple
links to a bad toast row:


https://bdrouvot.wordpress.com/2020/01/04/get-toast-chunk_id-from-the-user-table-tuples-or-from-the-toast-index-thanks-to-pageinspect/

-Jeremy

-- 
http://about.me/jeremy_schneider