Thread: Re: PostgreSQL 'Corruption & Fragmentation' detection andresolution/fix

Hi all,

I'm searching a way to detect postgresql corruption on a daily basis. Please
provide me 

1) a script which detects the corruption in postgres database/instance,

2) and the remedy steps to fix the issue 

Searched in lot of links and blogs but unable to find a concrete solution

And also it will be a very great  help to me if I can get in the same way
for fragmentation.

Please help me regarding this, since it has been a very long time working on
this.

Thanks in advance.


Regards,
Pavan



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-admin-f2076596.html


Re: PostgreSQL 'Corruption & Fragmentation' detection andresolution/fix

From
Fabio Pardi
Date:

Hi Pavan,

What over my experience turned out to be (incidentally) useful is a vacuum over the whole database(s). I have one that runs every night and when corruption happens you will notice it because vacuum will die, hopefully noticing you.


As per the fix, I m not sure. I think that if you find curruption, you should first understand what is corrupted. If you are lucky, is an index, and a recreation will fix it for you. But if the corruption happens on real data, then the problem must be approached in some other way(s).


regards,

fabio pardi


On 11/06/18 15:28, pavan95 wrote:
Hi all,

I'm searching a way to detect postgresql corruption on a daily basis. Please
provide me 

1) a script which detects the corruption in postgres database/instance,

2) and the remedy steps to fix the issue 

Searched in lot of links and blogs but unable to find a concrete solution

And also it will be a very great  help to me if I can get in the same way
for fragmentation.

Please help me regarding this, since it has been a very long time working on
this.

Thanks in advance.


Regards,
Pavan



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-admin-f2076596.html


Re: PostgreSQL 'Corruption & Fragmentation' detection andresolution/fix

From
Rui DeSousa
Date:
> On Jun 11, 2018, at 9:28 AM, pavan95 <pavan.postgresdba@gmail.com> wrote:
>
> I'm searching a way to detect postgresql corruption on a daily basis. Please
> provide me
>

If you haven’t already (show data_checksums); I would recommend turning on data checksums to allow Postgres to detect
i/odata corruption issues.   

If it’s not enabled then you’ll have to reinit your database:  initdb —data-checksums …

I would be very concerned why you feel the need to check for corruption and would question your platform.  The only
timeI’ve seen corruption was due to a disk subsystem problem which data-checksums should help flush out for you or a
Postgresbug which would normally gets bubbled up via the application and/or the logs. 

Make sure you have good backups; there really is no script that going to solve a faulty disk subsystem or other
disaster.

Re: PostgreSQL 'Corruption & Fragmentation' detection and resolution/fix

From
Pavan Teja
Date:
 


On Mon, Jun 11, 2018, 10:22 PM Rui DeSousa <rui.desousa@icloud.com> wrote:

> On Jun 11, 2018, at 9:28 AM, pavan95 <pavan.postgresdba@gmail.com> wrote:
>
> I'm searching a way to detect postgresql corruption on a daily basis. Please
> provide me
>

If you haven’t already (show data_checksums); I would recommend turning on data checksums to allow Postgres to detect i/o data corruption issues. 

If it’s not enabled then you’ll have to reinit your database:  initdb —data-checksums …

I would be very concerned why you feel the need to check for corruption and would question your platform.  The only time I’ve seen corruption was due to a disk subsystem problem which data-checksums should help flush out for you or a Postgres bug which would normally gets bubbled up via the application and/or the logs.

Make sure you have good backups; there really is no script that going to solve a faulty disk subsystem or other disaster.

Thanks Rui,

Actually found some issues like segmentation fault with sigssv 11. I'm concerned about to make my database healthy against critical problems mainly in case of production environment.  Also what's the instant thing that one needs to do in case of corrupted data.

Also it will be greatful if you can suggest me a script to find fragmentation

Regards,
Pavan

Re: PostgreSQL 'Corruption & Fragmentation' detection and resolution/fix

From
Peter Geoghegan
Date:
On Mon, Jun 11, 2018 at 6:28 AM, pavan95 <pavan.postgresdba@gmail.com> wrote:
> I'm searching a way to detect postgresql corruption on a daily basis. Please
> provide me

Check out amcheck:

https://github.com/petergeoghegan/amcheck

The version on Github has feature parity with the version from
Postgres 11 (it has the new "heapallindexed" stuff), but it works for
older versions of Postgres. There are official PGDG .deb and RPM
packages available.

-- 
Peter Geoghegan


Re: PostgreSQL 'Corruption & Fragmentation' detection andresolution/fix

From
Rui DeSousa
Date:

> On Jun 11, 2018, at 12:58 PM, Pavan Teja <pavan.postgresdba@gmail.com> wrote:
>
> Actually found some issues like segmentation fault with sigssv 11. I'm concerned about to make my database healthy
againstcritical problems mainly in case of production environment.  Also what's the instant thing that one needs to do
incase of corrupted data. 
>

Postgres should not be experiencing segmentation faults… what is your system configuration as it appear you may have a
problemwith your system build? 

Postgres should not corrupt due to a system crash — if it did; I wouldn’t be using it.  In fact Postgres files are
alwaysin an inconstant state and is why we have WAL files; with the data files and WAL files one ends up with a
consistentdatabase regardless of how or why Postgres crashed.   

What to do? That depends on the issue.  If you end up with a page corruption then you need to determine which objects
areaffected.  If it’s an index page you can simply rebuild it; if it’s a data page; then you need to go to you backups
andextract the given data — and worse case a full restore.  And I would also question why you ended up with a corrupted
pagebecause that shouldn’t happen unless you have a faulty subsystem or a bug. 

The error may report a oid and in that case you can use oid2name to get name; etc.

> Also it will be greatful if you can suggest me a script to find fragmentation
>

Do you mean bloat? There is a script floating around on Postgres’s Wiki that sorta works .

Personally, I cringe at bloat seekers… you end up chasing a fictional issue that doesn’t exist.  It’s OK to have bloat…
pageswill get reused and vacuum full is not a production friendly task nor are the reorg solutions.  I well regulated
systemwill have an own equilibrium; seeking bloat and full vacuums are counter productive.   

Reorg your database is a sure way to introduce data corruption — you need to trust that third party solution with your
data. I can tell you that I have inherited corrupted databases from so-called DBAs that have run some sort of reorg on
thedatabase and left the database in that state.   

Full vacuum should only be use to handle and problems cases where table fell out of its equilibrium due to bad query,
datacleanup, etc…  

Re: PostgreSQL 'Corruption & Fragmentation' detection and resolution/fix

From
Pavan Teja
Date:
Thank you for your response.  

So finally there's no script to determine corruption well in advance?? Correct??

Regards,
Pavan

On Mon, Jun 11, 2018, 11:20 PM Rui DeSousa <rui.desousa@icloud.com> wrote:


> On Jun 11, 2018, at 12:58 PM, Pavan Teja <pavan.postgresdba@gmail.com> wrote:
>
> Actually found some issues like segmentation fault with sigssv 11. I'm concerned about to make my database healthy against critical problems mainly in case of production environment.  Also what's the instant thing that one needs to do in case of corrupted data.
>

Postgres should not be experiencing segmentation faults… what is your system configuration as it appear you may have a problem with your system build?

Postgres should not corrupt due to a system crash — if it did; I wouldn’t be using it.  In fact Postgres files are always in an inconstant state and is why we have WAL files; with the data files and WAL files one ends up with a consistent database regardless of how or why Postgres crashed. 

What to do? That depends on the issue.  If you end up with a page corruption then you need to determine which objects are affected.  If it’s an index page you can simply rebuild it; if it’s a data page; then you need to go to you backups and extract the given data — and worse case a full restore.  And I would also question why you ended up with a corrupted page because that shouldn’t happen unless you have a faulty subsystem or a bug.

The error may report a oid and in that case you can use oid2name to get name; etc.

> Also it will be greatful if you can suggest me a script to find fragmentation
>

Do you mean bloat? There is a script floating around on Postgres’s Wiki that sorta works . 

Personally, I cringe at bloat seekers… you end up chasing a fictional issue that doesn’t exist.  It’s OK to have bloat… pages will get reused and vacuum full is not a production friendly task nor are the reorg solutions.  I well regulated system will have an own equilibrium; seeking bloat and full vacuums are counter productive. 

Reorg your database is a sure way to introduce data corruption — you need to trust that third party solution with your data.  I can tell you that I have inherited corrupted databases from so-called DBAs that have run some sort of reorg on the database and left the database in that state. 

Full vacuum should only be use to handle and problems cases where table fell out of its equilibrium due to bad query, data cleanup, etc…

Re: PostgreSQL 'Corruption & Fragmentation' detection andresolution/fix

From
Rui DeSousa
Date:
> On Jun 11, 2018, at 1:57 PM, Pavan Teja <pavan.postgresdba@gmail.com> wrote:
>
> So finally there's no script to determine corruption well in advance?? Correct??

It is your responsibility to make sure that the system is solid and all worst cases are covered along with striving for
thefive 9’s.  You need to building a system that you can trust which includes making sure you disk subsystem is really
andI mean really reliable.  RAID is the most falsely trusted system around; so you really need to know the subsystems
inyour system especially when subsystems that break fsync().  

Bit rot is real and if you system doesn’t handle it then that will lead to data corruption.  Postgres will only be able
totell you that it occurred if you have data_checksums feature enabled.  Your subsystem should handle it and actively
bechecking for it; however, most RAID systems don’t or fail to do a good job at it. 

Like I stated already; If I couldn’t entrust the data integrity of Postgres I would not be using it and would be
runningOracle instead. 

Re: PostgreSQL 'Corruption & Fragmentation' detection and resolution/fix

From
Pavan Teja
Date:
Hi Rui,

Thank you for the clarification. 

On Mon, Jun 11, 2018, 11:46 PM Rui DeSousa <rui.desousa@icloud.com> wrote:

> On Jun 11, 2018, at 1:57 PM, Pavan Teja <pavan.postgresdba@gmail.com> wrote:
>
> So finally there's no script to determine corruption well in advance?? Correct??

It is your responsibility to make sure that the system is solid and all worst cases are covered along with striving for the five 9’s.  You need to building a system that you can trust which includes making sure you disk subsystem is really and I mean really reliable.  RAID is the most falsely trusted system around; so you really need to know the subsystems in your system especially when subsystems that break fsync().

Bit rot is real and if you system doesn’t handle it then that will lead to data corruption.  Postgres will only be able to tell you that it occurred if you have data_checksums feature enabled.  Your subsystem should handle it and actively be checking for it; however, most RAID systems don’t or fail to do a good job at it.

Like I stated already; If I couldn’t entrust the data integrity of Postgres I would not be using it and would be running Oracle instead.
Hi,

Can we get any information regarding fragmentation/corruption in the
database level from the system table 

"Information_schema.columns" ?  

Actually my strong belief is  other databases like SQL server, Oracle &
Mysql are providing some system views to detect corruption and fragmentation
from the database. So thinking the same with the postgres.

As I was asked to prepare a script well in advance to predict the
forthcoming disasters

Regards,
Pavan



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-admin-f2076596.html


Re: PostgreSQL 'Corruption & Fragmentation' detection andresolution/fix

From
Mark Kirkwood
Date:
Re corruption, For heap (i.e table) the standard prescription is just to 
pg_dump sending output to /dev/null. This will bail if there are any 
corrupted pages. Also (as probably mentioned elsewhere), enable data 
checksums at initdb time and the above pg_dump will validate these too.

With respect to indexes - the latest versions of the amcheck extension 
verify these.

Re fragmentation, the usual problem with a Postgres db is not really 
classical 'fragmentation' but data bloat due to insufficient VACUUM. The 
storage model for Postgres uses 'copy on write' so presents a different 
problem path from pretty much every other DBMS.

Cheers

Mark


On 12/06/18 18:09, pavan95 wrote:
> Hi,
>
> Can we get any information regarding fragmentation/corruption in the
> database level from the system table
>
> "Information_schema.columns" ?
>
> Actually my strong belief is  other databases like SQL server, Oracle &
> Mysql are providing some system views to detect corruption and fragmentation
> from the database. So thinking the same with the postgres.
>
> As I was asked to prepare a script well in advance to predict the
> forthcoming disasters
>
> Regards,
> Pavan
>
>
>
> --
> Sent from: http://www.postgresql-archive.org/PostgreSQL-admin-f2076596.html
>



Re: PostgreSQL 'Corruption & Fragmentation' detection andresolution/fix

From
Thomas Kellerer
Date:
pavan95 schrieb am 12.06.2018 um 08:09:
> Can we get any information regarding fragmentation/corruption in the
> database level from the system table 

Fragmentation and corruption are two *very* different things. 

And with modern harddisks (SSD and similar technologies) fragmentation is not an issue any more




Re: PostgreSQL 'Corruption & Fragmentation' detection andresolution/fix

From
Rui DeSousa
Date:
Paven,

SQL Server (derived from Sybase) and Oracle where designed in a time when databases used RAW devices directly thus did
notuse a filesystem.  In contrast Postgres was designed to work with a filesystem; thus, a file only represents a
singledatabase object.  There is no fragmentation in the likes of Oracle/Sybase/MSSQL where database objects of varying
extentsizes are interwoven within the same file.  These database systems then need to identify the free space, coalesce
it,and return it to the free heap map — resulting in varying sizes of free space chunks.  Those systems basically had
toreimplement all the features of a filesystem. 

In Postgres you have free space in pages which also includes dead tuples on a page and completely free pages — this
representsthe bloat and the script provided on the wiki will give you an estimate of its size.  You can also use
pgstattuplefrom contrib to get the information. 

For deep inspection you can use pageinspect and pg_buffercache from contrib to view the internals workings… but you
probabledon’t want to start poking around the buffer cache on a live system. 

I really think that you need to start talking with your storage vendor and figure out what is actually happening there.
As Mark pointed out; Postgres uses COW semantics to support MVCC; and you’d be surprised to know that some enterprise
storagevendors also use COW semantics in their storage system making fragmentation really hard to quantify with all the
layers. Your storage is a chorus of layers; the internal storage of Postgres (COW/MVCC), the filesystem used?, volume
management?,storage devices?, etc.  — fragmentation happens at levels and there are tools there to assess/correct it
there.

Note that DBCC was never a selling point and neither is fsck; the fact that those tools are needed is a problem.

I think you also need to address your segmentation faults issue as that should not be occurring and is unacceptable.

Choose your filesystem and storage system wisely…



> On Jun 12, 2018, at 2:09 AM, pavan95 <pavan.postgresdba@gmail.com> wrote:
>
> Hi,
>
> Can we get any information regarding fragmentation/corruption in the
> database level from the system table
>
> "Information_schema.columns" ?
>
> Actually my strong belief is  other databases like SQL server, Oracle &
> Mysql are providing some system views to detect corruption and fragmentation
> from the database. So thinking the same with the postgres.
>
> As I was asked to prepare a script well in advance to predict the
> forthcoming disasters
>
> Regards,
> Pavan
>
>
>
> --
> Sent from: http://www.postgresql-archive.org/PostgreSQL-admin-f2076596.html
>



Re: PostgreSQL 'Corruption & Fragmentation' detection and resolution/fix

From
Peter Geoghegan
Date:
On Tue, Jun 12, 2018 at 8:21 AM, Rui DeSousa <rui.desousa@icloud.com> wrote:
> Note that DBCC was never a selling point and neither is fsck; the fact that those tools are needed is a problem.

DBCC is a selling point. "Parallel consistency check" is a feature
that is only available in SQL Server enterprise edition.

PostgreSQL has comparable tooling - contrib/amcheck. As I said,
amcheck is available as an external project for Postgres versions
prior to 10.

-- 
Peter Geoghegan


Re: PostgreSQL 'Corruption & Fragmentation' detection andresolution/fix

From
Rui DeSousa
Date:

> On Jun 12, 2018, at 12:16 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Tue, Jun 12, 2018 at 8:21 AM, Rui DeSousa <rui.desousa@icloud.com> wrote:
>> Note that DBCC was never a selling point and neither is fsck; the fact that those tools are needed is a problem.
>
> DBCC is a selling point. "Parallel consistency check" is a feature
> that is only available in SQL Server enterprise edition.

That’s not what I recall; it was clearly used against it in sales pitches.  Do you really want to trust a database that
requiresyou to DBCC checkdb periodically and fix corruption data pages or do you want to trust Oracle with data? 

Would you use a filesystem today that required you to fsck after a crash?

Sure DBCC has useful features like tracing, etc. — it’s not just for corruption.



Re: PostgreSQL 'Corruption & Fragmentation' detection and resolution/fix

From
Peter Geoghegan
Date:
On Tue, Jun 12, 2018 at 9:46 AM, Rui DeSousa <rui.desousa@icloud.com> wrote:
> That’s not what I recall; it was clearly used against it in sales pitches.  Do you really want to trust a database
thatrequires you to DBCC checkdb periodically and fix corruption data pages or do you want to trust Oracle with data? 

Oracle has comparable corruption detection tooling, which is also
quite extensive. It can perform verification online and offline, when
performing backups, and many other such things.

--
Peter Geoghegan


Re: PostgreSQL 'Corruption & Fragmentation' detection andresolution/fix

From
Rui DeSousa
Date:
>
> Oracle has comparable corruption detection tooling, which is also
> quite extensive. It can perform verification online and offline, when
> performing backups, and many other such things.
>

Yeah but having page collisions was not uncommon and I don’t think I ever heard of a page collision in Oracle.




Re: PostgreSQL 'Corruption & Fragmentation' detection andresolution/fix

From
Rui DeSousa
Date:
Peter,

I installed the contrib/amcheck that your recommended and read up on some of the email threads, etc.  I found it
interestingfrom the email threads that different glib versions change the collation order causing indexes to get
flagged. Looks to be a great tool to help validate system upgrades. 



Thank you guys, for your valuable time.



Regards,
Pavan



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-admin-f2076596.html