Thread: How am I supposed to fix this?

How am I supposed to fix this?

From
Larry Rosenman
Date:
I'm getting the below, and am unaware of how to fix it....

11.4 on FreeBSD 12.



ler=# reindex (verbose) table dns_query ;
INFO:  index "dns_query_pkey" was reindexed
DETAIL:  CPU: user: 114.29 s, system: 207.94 s, elapsed: 698.87 s
ERROR:  index "pg_toast_17760_index" contains unexpected zero page at 
block 23686
HINT:  Please REINDEX it.
CONTEXT:  parallel worker
ler=# reindex index pg_toast_17760_index;
ERROR:  relation "pg_toast_17760_index" does not exist
ler=# reindex (verbose) database ler;
INFO:  index "pg_class_oid_index" was reindexed
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO:  index "pg_class_relname_nsp_index" was reindexed
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO:  index "pg_class_tblspc_relfilenode_index" was reindexed
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO:  table "pg_catalog.pg_class" was reindexed
load: 14.53  cmd: psql 2675 [select] 2765.27r 0.01u 0.01s 0% 8292k
INFO:  index "dns_query_pkey" was reindexed
DETAIL:  CPU: user: 112.91 s, system: 205.51 s, elapsed: 688.28 s
ERROR:  index "pg_toast_17760_index" contains unexpected zero page at 
block 23686
HINT:  Please REINDEX it.
ler=#

ler=# select version();
                                                                        
version

-----------------------------------------------------------------------------------------------------------------------------------------------------
  PostgreSQL 11.4 on amd64-portbld-freebsd12.0, compiled by FreeBSD clang 
version 8.0.0 (tags/RELEASE_800/final 356365) (based on LLVM 8.0.0), 
64-bit
(1 row)

ler=#

-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 214-642-9640                 E-Mail: ler@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106



Re: How am I supposed to fix this?

From
Tomas Vondra
Date:
On Tue, Aug 06, 2019 at 12:06:45PM -0500, Larry Rosenman wrote:
>I'm getting the below, and am unaware of how to fix it....
>
>11.4 on FreeBSD 12.
>
>
>
>ler=# reindex (verbose) table dns_query ;
>INFO:  index "dns_query_pkey" was reindexed
>DETAIL:  CPU: user: 114.29 s, system: 207.94 s, elapsed: 698.87 s
>ERROR:  index "pg_toast_17760_index" contains unexpected zero page at 
>block 23686
>HINT:  Please REINDEX it.
>CONTEXT:  parallel worker
>ler=# reindex index pg_toast_17760_index;
>ERROR:  relation "pg_toast_17760_index" does not exist

You probably need to explicitly say pg_toast.pg_toast_17760_index here,
because pg_toast schema is not part of the search_path by default.

>ler=# reindex (verbose) database ler;
>INFO:  index "pg_class_oid_index" was reindexed
>DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
>INFO:  index "pg_class_relname_nsp_index" was reindexed
>DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
>INFO:  index "pg_class_tblspc_relfilenode_index" was reindexed
>DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
>INFO:  table "pg_catalog.pg_class" was reindexed
>load: 14.53  cmd: psql 2675 [select] 2765.27r 0.01u 0.01s 0% 8292k
>INFO:  index "dns_query_pkey" was reindexed
>DETAIL:  CPU: user: 112.91 s, system: 205.51 s, elapsed: 688.28 s
>ERROR:  index "pg_toast_17760_index" contains unexpected zero page at 
>block 23686
>HINT:  Please REINDEX it.
>ler=#
>

Assuming the toast index is corrupted, this is kinda expected (when trying
to reindex an index on the toasted data).

The question is how much other data corruption is there ...


regards
-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: How am I supposed to fix this?

From
Alvaro Herrera
Date:
On 2019-Aug-06, Larry Rosenman wrote:

> ler=# reindex index pg_toast_17760_index;
> ERROR:  relation "pg_toast_17760_index" does not exist

Maybe try "reindex index pg_toast.pg_toast_17760_index"

> ler=# reindex (verbose) database ler;
[...]
> ERROR:  index "pg_toast_17760_index" contains unexpected zero page at block
> 23686
> HINT:  Please REINDEX it.

I suspect REINDEX is trying to access that index for some other reason
than reindexing it.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: How am I supposed to fix this?

From
Peter Geoghegan
Date:
On Tue, Aug 6, 2019 at 10:19 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> The question is how much other data corruption is there ...

Larry could try running amcheck on the other indexes. Just the basic
bt_check_index() checks should be enough to detect problems like this.
They can be run fairly non-disruptively. Something like this should do
it:

SELECT bt_index_check(index => c.oid),
               c.relname,
               c.relpages
FROM pg_index i
JOIN pg_opclass op ON i.indclass[0] = op.oid
JOIN pg_am am ON op.opcmethod = am.oid
JOIN pg_class c ON i.indexrelid = c.oid
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE am.amname = 'btree'
-- Don't check temp tables, which may be from another session:
AND c.relpersistence != 't'
-- Function may throw an error when this is omitted:
AND c.relkind = 'i' AND i.indisready AND i.indisvalid
ORDER BY c.relpages DESC;

If this takes too long, you can always adjust the query to only verify
system indexes or TOAST indexes.

-- 
Peter Geoghegan



Re: How am I supposed to fix this?

From
Larry Rosenman
Date:
On 08/06/2019 12:30 pm, Peter Geoghegan wrote:
> On Tue, Aug 6, 2019 at 10:19 AM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> The question is how much other data corruption is there ...
> 
> Larry could try running amcheck on the other indexes. Just the basic
> bt_check_index() checks should be enough to detect problems like this.
> They can be run fairly non-disruptively. Something like this should do
> it:
> 
> SELECT bt_index_check(index => c.oid),
>                c.relname,
>                c.relpages
> FROM pg_index i
> JOIN pg_opclass op ON i.indclass[0] = op.oid
> JOIN pg_am am ON op.opcmethod = am.oid
> JOIN pg_class c ON i.indexrelid = c.oid
> JOIN pg_namespace n ON c.relnamespace = n.oid
> WHERE am.amname = 'btree'
> -- Don't check temp tables, which may be from another session:
> AND c.relpersistence != 't'
> -- Function may throw an error when this is omitted:
> AND c.relkind = 'i' AND i.indisready AND i.indisvalid
> ORDER BY c.relpages DESC;
> 
> If this takes too long, you can always adjust the query to only verify
> system indexes or TOAST indexes.
ler=# SELECT bt_index_check(index => c.oid),
ler-#                c.relname,
ler-#                c.relpages
ler-# FROM pg_index i
ler-# JOIN pg_opclass op ON i.indclass[0] = op.oid
ler-# JOIN pg_am am ON op.opcmethod = am.oid
ler-# JOIN pg_class c ON i.indexrelid = c.oid
ler-# JOIN pg_namespace n ON c.relnamespace = n.oid
ler-# WHERE am.amname = 'btree'
ler-# -- Don't check temp tables, which may be from another session:
ler-# AND c.relpersistence != 't'
ler-# -- Function may throw an error when this is omitted:
ler-# AND c.relkind = 'i' AND i.indisready AND i.indisvalid
ler-# ORDER BY c.relpages DESC;
ERROR:  function bt_index_check(index => oid) does not exist
LINE 1: SELECT bt_index_check(index => c.oid),
                ^
HINT:  No function matches the given name and argument types. You might 
need to add explicit type casts.
ler=#
-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 214-642-9640                 E-Mail: ler@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106



Re: How am I supposed to fix this?

From
Peter Geoghegan
Date:
On Tue, Aug 6, 2019 at 10:34 AM Larry Rosenman <ler@lerctr.org> wrote:
> ERROR:  function bt_index_check(index => oid) does not exist
> LINE 1: SELECT bt_index_check(index => c.oid),
>                 ^
> HINT:  No function matches the given name and argument types. You might
> need to add explicit type casts.

It's a contrib extension, so you have to "create extension amcheck" first.
-- 
Peter Geoghegan



Re: How am I supposed to fix this?

From
Larry Rosenman
Date:
On 08/06/2019 12:35 pm, Peter Geoghegan wrote:
> On Tue, Aug 6, 2019 at 10:34 AM Larry Rosenman <ler@lerctr.org> wrote:
>> ERROR:  function bt_index_check(index => oid) does not exist
>> LINE 1: SELECT bt_index_check(index => c.oid),
>>                 ^
>> HINT:  No function matches the given name and argument types. You 
>> might
>> need to add explicit type casts.
> 
> It's a contrib extension, so you have to "create extension amcheck" 
> first.


the check is running (this is a HUGE table).

For the initial error, it would be nice if:
1) the pg_toast schema was mentioned
or
2) reindex searched pg_toast as well.

I did do the reindex pg_toast. index.


-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 214-642-9640                 E-Mail: ler@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106



Re: How am I supposed to fix this?

From
Larry Rosenman
Date:
On 08/06/2019 12:45 pm, Larry Rosenman wrote:
> On 08/06/2019 12:35 pm, Peter Geoghegan wrote:
>> On Tue, Aug 6, 2019 at 10:34 AM Larry Rosenman <ler@lerctr.org> wrote:
>>> ERROR:  function bt_index_check(index => oid) does not exist
>>> LINE 1: SELECT bt_index_check(index => c.oid),
>>>                 ^
>>> HINT:  No function matches the given name and argument types. You 
>>> might
>>> need to add explicit type casts.
>> 
>> It's a contrib extension, so you have to "create extension amcheck" 
>> first.
> 
> 
> the check is running (this is a HUGE table).
> 
> For the initial error, it would be nice if:
> 1) the pg_toast schema was mentioned
> or
> 2) reindex searched pg_toast as well.
> 
> I did do the reindex pg_toast. index.

As a followup, btcheck found another index that had issues, and a toast 
table was missing a chunk.

I have ALL the data I used to create this table still around so I just 
dropped it and am reloading the data.

I still think that the error message should mention the fully qualified 
index name.


-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 214-642-9640                 E-Mail: ler@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106



Re: How am I supposed to fix this?

From
Peter Geoghegan
Date:
On Tue, Aug 6, 2019 at 11:11 AM Larry Rosenman <ler@lerctr.org> wrote:
> As a followup, btcheck found another index that had issues, and a toast
> table was missing a chunk.
>
> I have ALL the data I used to create this table still around so I just
> dropped it and am reloading the data.

It sounds like there is a generic storage issue at play here. Often
TOAST data is the apparent first thing that gets corrupted, because
that's only because the inconsistencies are relatively obvious.

I suggest that you rerun amcheck using the same query, though this
time specify "heapallindexed=true" to bt_check_index(). Increase
maintenance_work_mem if it's set to a low value first (ideally you can
crank it up to 600MB). This type of verification will take a lot
longer, but will find more subtle inconsistencies that could easily be
missed.

Please let us know how this goes. I am always keen to hear about how
much the tooling helps in the real world.

-- 
Peter Geoghegan



Re: How am I supposed to fix this?

From
Larry Rosenman
Date:
On 08/06/2019 1:16 pm, Peter Geoghegan wrote:
> On Tue, Aug 6, 2019 at 11:11 AM Larry Rosenman <ler@lerctr.org> wrote:
>> As a followup, btcheck found another index that had issues, and a 
>> toast
>> table was missing a chunk.
>> 
>> I have ALL the data I used to create this table still around so I just
>> dropped it and am reloading the data.
> 
> It sounds like there is a generic storage issue at play here. Often
> TOAST data is the apparent first thing that gets corrupted, because
> that's only because the inconsistencies are relatively obvious.
> 
> I suggest that you rerun amcheck using the same query, though this
> time specify "heapallindexed=true" to bt_check_index(). Increase
> maintenance_work_mem if it's set to a low value first (ideally you can
> crank it up to 600MB). This type of verification will take a lot
> longer, but will find more subtle inconsistencies that could easily be
> missed.
> 
> Please let us know how this goes. I am always keen to hear about how
> much the tooling helps in the real world.

I've already dropped and re-created the table involved (a metric crapton
of DNS queries). I know why and how this happened as well.  I had to
fully restore my system, and bacula didn't catch all the data etc since 
it
was being modified, and I didn't do the smart thing then and restore 
from a pg_dump.


-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 214-642-9640                 E-Mail: ler@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106