Thread: Re: PostgreSQL 'Corruption & Fragmentation' detection andresolution/fix
Hi all, I'm searching a way to detect postgresql corruption on a daily basis. Please provide me 1) a script which detects the corruption in postgres database/instance, 2) and the remedy steps to fix the issue Searched in lot of links and blogs but unable to find a concrete solution And also it will be a very great help to me if I can get in the same way for fragmentation. Please help me regarding this, since it has been a very long time working on this. Thanks in advance. Regards, Pavan -- Sent from: http://www.postgresql-archive.org/PostgreSQL-admin-f2076596.html
Hi Pavan,
What over my experience turned out to be (incidentally) useful is a vacuum over the whole database(s). I have one that runs every night and when corruption happens, you will notice it because vacuum will die, hopefully noticing you.
As per the fix, I m not sure. I think that if you find curruption, you should first understand what is corrupted. If you are lucky, is an index, and a recreation will fix it for you. But if the corruption happens on real data, then the problem must be approached in some other way(s).
regards,
fabio pardi
On 11/06/18 15:28, pavan95 wrote:
Hi all, I'm searching a way to detect postgresql corruption on a daily basis. Please provide me 1) a script which detects the corruption in postgres database/instance, 2) and the remedy steps to fix the issue Searched in lot of links and blogs but unable to find a concrete solution And also it will be a very great help to me if I can get in the same way for fragmentation. Please help me regarding this, since it has been a very long time working on this. Thanks in advance. Regards, Pavan -- Sent from: http://www.postgresql-archive.org/PostgreSQL-admin-f2076596.html
> On Jun 11, 2018, at 9:28 AM, pavan95 <pavan.postgresdba@gmail.com> wrote: > > I'm searching a way to detect postgresql corruption on a daily basis. Please > provide me > If you haven’t already (show data_checksums); I would recommend turning on data checksums to allow Postgres to detect i/odata corruption issues. If it’s not enabled then you’ll have to reinit your database: initdb —data-checksums … I would be very concerned why you feel the need to check for corruption and would question your platform. The only timeI’ve seen corruption was due to a disk subsystem problem which data-checksums should help flush out for you or a Postgresbug which would normally gets bubbled up via the application and/or the logs. Make sure you have good backups; there really is no script that going to solve a faulty disk subsystem or other disaster.
On Mon, Jun 11, 2018, 10:22 PM Rui DeSousa <rui.desousa@icloud.com> wrote:
> On Jun 11, 2018, at 9:28 AM, pavan95 <pavan.postgresdba@gmail.com> wrote:
>
> I'm searching a way to detect postgresql corruption on a daily basis. Please
> provide me
>
If you haven’t already (show data_checksums); I would recommend turning on data checksums to allow Postgres to detect i/o data corruption issues.
If it’s not enabled then you’ll have to reinit your database: initdb —data-checksums …
I would be very concerned why you feel the need to check for corruption and would question your platform. The only time I’ve seen corruption was due to a disk subsystem problem which data-checksums should help flush out for you or a Postgres bug which would normally gets bubbled up via the application and/or the logs.
Make sure you have good backups; there really is no script that going to solve a faulty disk subsystem or other disaster.
Thanks Rui,
Actually found some issues like segmentation fault with sigssv 11. I'm concerned about to make my database healthy against critical problems mainly in case of production environment. Also what's the instant thing that one needs to do in case of corrupted data.
Also it will be greatful if you can suggest me a script to find fragmentation
Regards,
Pavan
On Mon, Jun 11, 2018 at 6:28 AM, pavan95 <pavan.postgresdba@gmail.com> wrote: > I'm searching a way to detect postgresql corruption on a daily basis. Please > provide me Check out amcheck: https://github.com/petergeoghegan/amcheck The version on Github has feature parity with the version from Postgres 11 (it has the new "heapallindexed" stuff), but it works for older versions of Postgres. There are official PGDG .deb and RPM packages available. -- Peter Geoghegan
> On Jun 11, 2018, at 12:58 PM, Pavan Teja <pavan.postgresdba@gmail.com> wrote: > > Actually found some issues like segmentation fault with sigssv 11. I'm concerned about to make my database healthy againstcritical problems mainly in case of production environment. Also what's the instant thing that one needs to do incase of corrupted data. > Postgres should not be experiencing segmentation faults… what is your system configuration as it appear you may have a problemwith your system build? Postgres should not corrupt due to a system crash — if it did; I wouldn’t be using it. In fact Postgres files are alwaysin an inconstant state and is why we have WAL files; with the data files and WAL files one ends up with a consistentdatabase regardless of how or why Postgres crashed. What to do? That depends on the issue. If you end up with a page corruption then you need to determine which objects areaffected. If it’s an index page you can simply rebuild it; if it’s a data page; then you need to go to you backups andextract the given data — and worse case a full restore. And I would also question why you ended up with a corrupted pagebecause that shouldn’t happen unless you have a faulty subsystem or a bug. The error may report a oid and in that case you can use oid2name to get name; etc. > Also it will be greatful if you can suggest me a script to find fragmentation > Do you mean bloat? There is a script floating around on Postgres’s Wiki that sorta works . Personally, I cringe at bloat seekers… you end up chasing a fictional issue that doesn’t exist. It’s OK to have bloat… pageswill get reused and vacuum full is not a production friendly task nor are the reorg solutions. I well regulated systemwill have an own equilibrium; seeking bloat and full vacuums are counter productive. Reorg your database is a sure way to introduce data corruption — you need to trust that third party solution with your data. I can tell you that I have inherited corrupted databases from so-called DBAs that have run some sort of reorg on thedatabase and left the database in that state. Full vacuum should only be use to handle and problems cases where table fell out of its equilibrium due to bad query, datacleanup, etc…
Thank you for your response.
So finally there's no script to determine corruption well in advance?? Correct??
Regards,
Pavan
On Mon, Jun 11, 2018, 11:20 PM Rui DeSousa <rui.desousa@icloud.com> wrote:
> On Jun 11, 2018, at 12:58 PM, Pavan Teja <pavan.postgresdba@gmail.com> wrote:
>
> Actually found some issues like segmentation fault with sigssv 11. I'm concerned about to make my database healthy against critical problems mainly in case of production environment. Also what's the instant thing that one needs to do in case of corrupted data.
>
Postgres should not be experiencing segmentation faults… what is your system configuration as it appear you may have a problem with your system build?
Postgres should not corrupt due to a system crash — if it did; I wouldn’t be using it. In fact Postgres files are always in an inconstant state and is why we have WAL files; with the data files and WAL files one ends up with a consistent database regardless of how or why Postgres crashed.
What to do? That depends on the issue. If you end up with a page corruption then you need to determine which objects are affected. If it’s an index page you can simply rebuild it; if it’s a data page; then you need to go to you backups and extract the given data — and worse case a full restore. And I would also question why you ended up with a corrupted page because that shouldn’t happen unless you have a faulty subsystem or a bug.
The error may report a oid and in that case you can use oid2name to get name; etc.
> Also it will be greatful if you can suggest me a script to find fragmentation
>
Do you mean bloat? There is a script floating around on Postgres’s Wiki that sorta works .
Personally, I cringe at bloat seekers… you end up chasing a fictional issue that doesn’t exist. It’s OK to have bloat… pages will get reused and vacuum full is not a production friendly task nor are the reorg solutions. I well regulated system will have an own equilibrium; seeking bloat and full vacuums are counter productive.
Reorg your database is a sure way to introduce data corruption — you need to trust that third party solution with your data. I can tell you that I have inherited corrupted databases from so-called DBAs that have run some sort of reorg on the database and left the database in that state.
Full vacuum should only be use to handle and problems cases where table fell out of its equilibrium due to bad query, data cleanup, etc…
> On Jun 11, 2018, at 1:57 PM, Pavan Teja <pavan.postgresdba@gmail.com> wrote: > > So finally there's no script to determine corruption well in advance?? Correct?? It is your responsibility to make sure that the system is solid and all worst cases are covered along with striving for thefive 9’s. You need to building a system that you can trust which includes making sure you disk subsystem is really andI mean really reliable. RAID is the most falsely trusted system around; so you really need to know the subsystems inyour system especially when subsystems that break fsync(). Bit rot is real and if you system doesn’t handle it then that will lead to data corruption. Postgres will only be able totell you that it occurred if you have data_checksums feature enabled. Your subsystem should handle it and actively bechecking for it; however, most RAID systems don’t or fail to do a good job at it. Like I stated already; If I couldn’t entrust the data integrity of Postgres I would not be using it and would be runningOracle instead.
Hi Rui,
Thank you for the clarification.
On Mon, Jun 11, 2018, 11:46 PM Rui DeSousa <rui.desousa@icloud.com> wrote:
> On Jun 11, 2018, at 1:57 PM, Pavan Teja <pavan.postgresdba@gmail.com> wrote:
>
> So finally there's no script to determine corruption well in advance?? Correct??
It is your responsibility to make sure that the system is solid and all worst cases are covered along with striving for the five 9’s. You need to building a system that you can trust which includes making sure you disk subsystem is really and I mean really reliable. RAID is the most falsely trusted system around; so you really need to know the subsystems in your system especially when subsystems that break fsync().
Bit rot is real and if you system doesn’t handle it then that will lead to data corruption. Postgres will only be able to tell you that it occurred if you have data_checksums feature enabled. Your subsystem should handle it and actively be checking for it; however, most RAID systems don’t or fail to do a good job at it.
Like I stated already; If I couldn’t entrust the data integrity of Postgres I would not be using it and would be running Oracle instead.
Hi, Can we get any information regarding fragmentation/corruption in the database level from the system table "Information_schema.columns" ? Actually my strong belief is other databases like SQL server, Oracle & Mysql are providing some system views to detect corruption and fragmentation from the database. So thinking the same with the postgres. As I was asked to prepare a script well in advance to predict the forthcoming disasters Regards, Pavan -- Sent from: http://www.postgresql-archive.org/PostgreSQL-admin-f2076596.html
Re corruption, For heap (i.e table) the standard prescription is just to pg_dump sending output to /dev/null. This will bail if there are any corrupted pages. Also (as probably mentioned elsewhere), enable data checksums at initdb time and the above pg_dump will validate these too. With respect to indexes - the latest versions of the amcheck extension verify these. Re fragmentation, the usual problem with a Postgres db is not really classical 'fragmentation' but data bloat due to insufficient VACUUM. The storage model for Postgres uses 'copy on write' so presents a different problem path from pretty much every other DBMS. Cheers Mark On 12/06/18 18:09, pavan95 wrote: > Hi, > > Can we get any information regarding fragmentation/corruption in the > database level from the system table > > "Information_schema.columns" ? > > Actually my strong belief is other databases like SQL server, Oracle & > Mysql are providing some system views to detect corruption and fragmentation > from the database. So thinking the same with the postgres. > > As I was asked to prepare a script well in advance to predict the > forthcoming disasters > > Regards, > Pavan > > > > -- > Sent from: http://www.postgresql-archive.org/PostgreSQL-admin-f2076596.html >
pavan95 schrieb am 12.06.2018 um 08:09: > Can we get any information regarding fragmentation/corruption in the > database level from the system table Fragmentation and corruption are two *very* different things. And with modern harddisks (SSD and similar technologies) fragmentation is not an issue any more
Paven, SQL Server (derived from Sybase) and Oracle where designed in a time when databases used RAW devices directly thus did notuse a filesystem. In contrast Postgres was designed to work with a filesystem; thus, a file only represents a singledatabase object. There is no fragmentation in the likes of Oracle/Sybase/MSSQL where database objects of varying extentsizes are interwoven within the same file. These database systems then need to identify the free space, coalesce it,and return it to the free heap map — resulting in varying sizes of free space chunks. Those systems basically had toreimplement all the features of a filesystem. In Postgres you have free space in pages which also includes dead tuples on a page and completely free pages — this representsthe bloat and the script provided on the wiki will give you an estimate of its size. You can also use pgstattuplefrom contrib to get the information. For deep inspection you can use pageinspect and pg_buffercache from contrib to view the internals workings… but you probabledon’t want to start poking around the buffer cache on a live system. I really think that you need to start talking with your storage vendor and figure out what is actually happening there. As Mark pointed out; Postgres uses COW semantics to support MVCC; and you’d be surprised to know that some enterprise storagevendors also use COW semantics in their storage system making fragmentation really hard to quantify with all the layers. Your storage is a chorus of layers; the internal storage of Postgres (COW/MVCC), the filesystem used?, volume management?,storage devices?, etc. — fragmentation happens at levels and there are tools there to assess/correct it there. Note that DBCC was never a selling point and neither is fsck; the fact that those tools are needed is a problem. I think you also need to address your segmentation faults issue as that should not be occurring and is unacceptable. Choose your filesystem and storage system wisely… > On Jun 12, 2018, at 2:09 AM, pavan95 <pavan.postgresdba@gmail.com> wrote: > > Hi, > > Can we get any information regarding fragmentation/corruption in the > database level from the system table > > "Information_schema.columns" ? > > Actually my strong belief is other databases like SQL server, Oracle & > Mysql are providing some system views to detect corruption and fragmentation > from the database. So thinking the same with the postgres. > > As I was asked to prepare a script well in advance to predict the > forthcoming disasters > > Regards, > Pavan > > > > -- > Sent from: http://www.postgresql-archive.org/PostgreSQL-admin-f2076596.html >
On Tue, Jun 12, 2018 at 8:21 AM, Rui DeSousa <rui.desousa@icloud.com> wrote: > Note that DBCC was never a selling point and neither is fsck; the fact that those tools are needed is a problem. DBCC is a selling point. "Parallel consistency check" is a feature that is only available in SQL Server enterprise edition. PostgreSQL has comparable tooling - contrib/amcheck. As I said, amcheck is available as an external project for Postgres versions prior to 10. -- Peter Geoghegan
> On Jun 12, 2018, at 12:16 PM, Peter Geoghegan <pg@bowt.ie> wrote: > > On Tue, Jun 12, 2018 at 8:21 AM, Rui DeSousa <rui.desousa@icloud.com> wrote: >> Note that DBCC was never a selling point and neither is fsck; the fact that those tools are needed is a problem. > > DBCC is a selling point. "Parallel consistency check" is a feature > that is only available in SQL Server enterprise edition. That’s not what I recall; it was clearly used against it in sales pitches. Do you really want to trust a database that requiresyou to DBCC checkdb periodically and fix corruption data pages or do you want to trust Oracle with data? Would you use a filesystem today that required you to fsck after a crash? Sure DBCC has useful features like tracing, etc. — it’s not just for corruption.
On Tue, Jun 12, 2018 at 9:46 AM, Rui DeSousa <rui.desousa@icloud.com> wrote: > That’s not what I recall; it was clearly used against it in sales pitches. Do you really want to trust a database thatrequires you to DBCC checkdb periodically and fix corruption data pages or do you want to trust Oracle with data? Oracle has comparable corruption detection tooling, which is also quite extensive. It can perform verification online and offline, when performing backups, and many other such things. -- Peter Geoghegan
> > Oracle has comparable corruption detection tooling, which is also > quite extensive. It can perform verification online and offline, when > performing backups, and many other such things. > Yeah but having page collisions was not uncommon and I don’t think I ever heard of a page collision in Oracle.
Peter, I installed the contrib/amcheck that your recommended and read up on some of the email threads, etc. I found it interestingfrom the email threads that different glib versions change the collation order causing indexes to get flagged. Looks to be a great tool to help validate system upgrades.
Thank you guys, for your valuable time. Regards, Pavan -- Sent from: http://www.postgresql-archive.org/PostgreSQL-admin-f2076596.html