Re: multiple tables got corrupted - Mailing list pgsql-general

From Jerry Sievers
Subject Re: multiple tables got corrupted
Date
Msg-id 87d02jkwk1.fsf@jsievers.enova.com
Whole thread Raw
In response to multiple tables got corrupted  (Vasu Madhineni <vasumdba1515@gmail.com>)
List pgsql-general
Vasu Madhineni <vasumdba1515@gmail.com> writes:

> Hi Magnus,
>
> Thanks for your update.
> To identify the number of tables corrupted in the database if I run
> below command, Will any impact on other tables in the production
> environment. 
>
> "pg_dump -f /dev/null database"

Consider using pg_dump or any other means to dump *each* table
individually.

pg_dump is going to abort on the first case of corruption in any table
that results in a read error on full scan, thus in a scenario where
multiple corrupt tables is likely, you're not going to get too far
w/monolithic approach.



>
> Thanks in advance.
>
> Regards,
> Vasu Madhineni
>
> On Fri, Sep 18, 2020 at 3:42 PM Magnus Hagander <magnus@hagander.net>
> wrote:
>
>     That depends on what the problem is and how they fix it. Most
>     likely yes -- especially since if you haven't enabled data
>     checksums you won't *know* if things are OK or not. So I'd
>     definitely recommend it even if things *look* OK.
>
>     //Magnus
>
>
>     On Wed, Sep 16, 2020 at 5:06 AM Vasu Madhineni <
>     vasumdba1515@gmail.com> wrote:
>
>         I could see block read I/O errors in /var/log/syslog. if
>         those error fixed by OS team, will it require recovery.
>
>         Also can i use LIMIT and OFFSET to locate corrupted rows?
>
>         Thanks in advance
>
>         Regards,
>         Vasu Madhineni
>
>         On Wed, Sep 16, 2020, 01:58 Magnus Hagander <
>         magnus@hagander.net> wrote:
>
>             Try reading them "row by row" until it breaks. That is,
>             SELECT * FROM ... LIMIT 1, then LIMIT 2 etc. For more
>             efficiency use a binary search starting at what seems
>             like a reasonable place looking at the size of the table
>             vs the first failed block to make it faster, but the
>             principle is the same. Once it fails, you've found a
>             corrupt block...
>
>             //Magnus
>
>
>             On Tue, Sep 15, 2020 at 12:46 PM Vasu Madhineni <
>             vasumdba1515@gmail.com> wrote:
>
>                 Is it possible to identify which rows are corrupted
>                 in particular tables.
>
>                 On Tue, Sep 15, 2020 at 5:36 PM Magnus Hagander <
>                 magnus@hagander.net> wrote:
>
>
>
>                     On Tue, Sep 15, 2020 at 11:15 AM Vasu Madhineni <
>                     vasumdba1515@gmail.com> wrote:
>
>                         Hi All,
>
>                         In one of my postgres databases multiple
>                         tables got corrupted and followed the below
>                         steps but still the same error.
>
>                         1.SET zero_damaged_pages = on
>                         2. VACUUM ANALYZE, VACUUM FULL
>                         but still same error. 
>
>
>
>                     That is a very destructive first attempt. I hope
>                     you took a full disk-level backup of the database
>                     before you did that, as it can ruin your chances
>                     for forensics and data recovery for other issues.
>
>
>
>                         moh_fa=# VACUUM FULL;
>                         ERROR:  could not read block 9350 in file
>                         "base/1156523/1270812": Input/output error
>
>                         Tried to take backup of tables with pg_dump
>                         but same error. files exist physically in
>                         base location.
>
>                         How to proceed on this, no backup to restore.
>
>
>
>                     This is clearly some sort of disk error, and with
>                     no backups to restore you will definitely be
>                     losing data.
>
>                     I'd start by figuring out which tables have no
>                     corruption and do work, and back those up (with
>                     pg_dump for example) as soon as possible to a
>                     different machine -- since it's not exactly
>                     unlikely that further disk errors will appear.
>
>                     Once you've done that, identify the tables, and
>                     then try to do partial recovery. For example, if
>                     you look at the file 1270812, how big it is?
>                     PostgreSQL is failing to read block 9350 which
>                     is 76595200 bytes into the file. If this is at
>                     the very end of the file, you can for example try
>                     to get the data out until that point with LIMIT.
>                     If it's in the middle of the file, it gets more
>                     ticky, but similar approaches can be done. 
>
>                     Also, unless you are running with data checksums
>                     enabled, I wouldn't fully trust the data in the
>                     tables that you *can* read either. Since you
>                     clearly have disk issues, they may have caused
>                     corruption elsewhere as well, so whatever
>                     verification you can do against other tables, you
>                     should do as well.
>
>
>                     You'll of course also want to check any kernel
>                     logs or storage system logs to see if they can
>                     give you a hint as to what happened, but they are
>                     unlikely to actually give you something that will
>                     help you fix the problem.
>
>
>
>

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net



pgsql-general by date:

Previous
From: Vasu Madhineni
Date:
Subject: Re: multiple tables got corrupted
Next
From: Igor Korot
Date:
Subject: How to write such a query