Re: After replication failover: could not read block X in file Y read only 0 of 8192 bytes - Mailing list pgsql-general

From Brian Sutherland
Subject Re: After replication failover: could not read block X in file Y read only 0 of 8192 bytes
Date
Msg-id 20160531092212.GA63217@Admins-MacBook-Air-2.local
Whole thread Raw
In response to Re: After replication failover: could not read block X in file Y read only 0 of 8192 bytes  (Venkata Balaji N <nag1010@gmail.com>)
List pgsql-general
On Tue, May 31, 2016 at 04:49:26PM +1000, Venkata Balaji N wrote:
> On Mon, May 30, 2016 at 11:37 PM, Brian Sutherland <brian@vanguardistas.net>
> wrote:
>
> > I'm running a streaming replication setup with PostgreSQL 9.5.2 and have
> > started seeing these errors on a few INSERTs:
> >
> >     ERROR:  could not read block 8 in file "base/3884037/3885279": read
> > only 0 of 8192 bytes
> >
>
> These errors are occurring on master or slave ?

On the master (which was previously a slave)

> > on a few tables. If I look at that specific file, it's only 6 blocks
> > long:
> >
> >     # ls -la base/3884037/3885279
> >     -rw------- 1 postgres postgres 49152 May 30 12:56 base/3884037/3885279
> >
> > It seems that this is the case on most tables in this state. I havn't
> > seen any error on SELECT and I can SELECT * on the all tables I know
> > have this problem. The database is machine is under reasonable load.
> >
>
> So, the filenodes generating this error belong to a Table ? or an Index ?

So far I have found 3 tables with this issue, 2 were pg_statistic in
different databases. The one referenced above is definitely a table:
"design_file".

The usage pattern on that table is to DELETE and later INSERT a few
hundred rows at a time on an occasional basis. The table is very small,
680 rows.

> > On some tables an "ANALYZE tablename" causes the error.

I discovered why ANALYZE raised an error, it was because pg_statistic
was affected. "vacuum full verbose pg_statistic;" fixed it. Hoping any
missing statistics get re-generated.

> > We recently had a streaming replication failover after loading a large
> > amount of data with pg_restore. The problems seem to have started after
> > that, but I'm not perfectly sure.
>
> pg_restore has completed successfully ?

pg_restore did complete successfully

> When pg_restore was running, did
> you see anything suspicious in the postgresql logfiles ?

The restore happened on the old master. The logfile was long since
deleted :(

> I have data_checksums switched on so am suspecting a streaming
> > replication bug.  Anyone know of a recent bug which could have caused
> > this?
> >
>
> I cannot conclude at this point. I encountered these kind of errors with
> Indexes and re-indexing fixed them.

This is actually the second time I am seeing these kinds of errors, in
the past, after verifying that no data was lost I used VACUUM FULL to
recover the ability to INSERT. There was no pitchfork uprising...

> Regards,
> Venkata B N
>
> Fujitsu Australia

--
Brian Sutherland


pgsql-general by date:

Previous
From: CN
Date:
Subject: Switching roles as an replacement of connection pooling tools
Next
From: Alexander Farber
Date:
Subject: How to hide JDBC connection credentials from git?