Thread: Suspected Postgres Datacorruption

Suspected Postgres Datacorruption

From
Sumeet Jauhar
Date:

Hi All ,


Can you please help me out with the following questions .

 

Our application is running on Postgres 7.4.X . I agree that this is a very old version of Postgres and we should have upgraded . The issue that we faced is that

 

1 . There was a system crash due to a hardware failure .

 

2 . When the system came up , we tried to insert a few records into the database . However at this point in time we saw that Postgres was taking a lot of CPU & memory .

 

Around 42% CPU consumption . This was a cause of concern .

 

3 . We re-indexed the database and it helped reduce the cpu & memory consumption .

 

My question is

 

A ) Isn’t Postgres database resilient enough to handle hardware system failure ? or it sometime results in a corrupt index for its tables ? I read on the Postgres site that hardware failure can cause corrupt indexes . Besides this are there any other scenario which may result in such corruption .

B) If there has been improvement / enhancements done by Postgres regarding the way it handles corrupt indexes can you please pass me more information  about the bug Id or some documentation on it ? Our application does not do any REINDEXING . I am in a dilemma if we should seriously incorporate it in our application .

 

I ideally want to push to a higher version of Postgres . If I can prove that there will be significant performance benefits and that crashes won’t occur then I will be able to present a strong case .


Since my question is related to Performance & Data corruption i saw on the Postgres site that i should provide the following information


Addition Info :


CPU manufacturer and model : Intel's Itanium Processor 

Do you use a RAID controller? yes


PCIe SAS SmartArray P410i RAID Controller

PCIe SAS SmartArray P411 RAID Controller


Is is Write back caching enabled ?

    Total Cache Size (MB)............... 144
    Read Cache........................ N/A

    Write Cache....................... N/A

No of disks : 4


Have you ever set fsync=off in the postgresql config file?

#fsync = true                   # turns forced synchronization on or off

I never changed it .


Have you had any unexpected power loss lately? Replaced a failed RAID disk? Had an operating system crash?  Yes system crashed had occured .


Hope this information helps .


Regards,

Sumeet


Re: Suspected Postgres Datacorruption

From
Scott Marlowe
Date:
On Wed, Aug 3, 2011 at 1:35 AM, Sumeet Jauhar <sumeet.jauhar@gmail.com> wrote:
>
>
> Our application is running on Postgres 7.4.X . I agree that this is a very
> old version of Postgres and we should have upgraded . The issue that we
> faced is that

Wow, that is a very old version.  It has been out of maintenance for a
long time.  If there are data eating bugs in it they aren't gonna get
fixed.

> 1 . There was a system crash due to a hardware failure .
>
> 2 . When the system came up , we tried to insert a few records into the
> database . However at this point in time we saw that Postgres was taking a
> lot of CPU & memory .
>
> Around 42% CPU consumption . This was a cause of concern .
>
> 3 . We re-indexed the database and it helped reduce the cpu & memory
> consumption .
>
> My question is
>
> A ) Isn’t Postgres database resilient enough to handle hardware system
> failure ? or it sometime results in a corrupt index for its tables ? I read
> on the Postgres site that hardware failure can cause corrupt indexes .
> Besides this are there any other scenario which may result in such
> corruption .

Depends on the hardware failure.  If your RAID controller starts
writing garbage to the drive array, how exactly should postgresql fix
that?  OTOH, if you just have a big boom and the power supply goes
out, most the time you're fine.  Of course, if the drive subsystem is
lying about fsyncs, then postgresql can't guarantee your data anyway.
So, it really depends on your hardware.  Standard test to make sure
your hardware is ok is to install postgresql, start a lot of
transactions at once, and walk around back and pull the power plug.
If it comes back up a half dozen times without errors you're probably
ok, but hey, there could still be a corner case out there too.  Bonus
points if you initiate checkpoint that'll take a few minutes before
you pull the plug, increasing the chance you'll find problems.

With 7.4 there's a real likelihood that there are data loss bugs in
there that have never been fixed and never will be.

> B) If there has been improvement / enhancements done by Postgres regarding
> the way it handles corrupt indexes can you please pass me more information
> about the bug Id or some documentation on it ? Our application does not do
> any REINDEXING . I am in a dilemma if we should seriously incorporate it in
> our application .

Of course, there's been lots of improvements since 7.4  But being a
database when it encounters errors it tries not to guess too much
about what you want.  IS a reindex the right thing to do?  Maybe,
maybe not.  That's the job of the DBA to figure out.  Regular
reindexing is not needed and if your particular machine does need it
you need to figure out why and change it so that it's not needed.  If
indexes are getting corrupted, chances are so are tables and you'll
notice too late.

> I ideally want to push to a higher version of Postgres . If I can prove that
> there will be significant performance benefits and that crashes won’t occur
> then I will be able to present a strong case .

Hehe.  It would be hard to NOT get significant performance
improvements moving from 7.4 to 9.0.  Heck our load on our production
servers went from 12 to 3 or so when we went from 8.1 to 8.3.  Saved
us a ton on what we would have had to spend to keep 8.1 happy.
Install a test version of 9.0 on a laptop, point your test servers at
it, and watch it outrun your production database for 90% of everything
you do.

We run 8.3 and 8.4 in production and they are literally light years
ahead of 7.4 in terms of stability, performance, and capabilities.
Plus when you find a problem in one of them, it gets fixed, fast.
They're still supported.  Just that would be enough to justify an
upgrade for me.

Re: Suspected Postgres Datacorruption

From
"Nicholson, Brad (Toronto, ON, CA)"
Date:
> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-
> owner@postgresql.org] On Behalf Of Scott Marlowe
> Sent: Thursday, August 04, 2011 5:22 PM
> To: Sumeet Jauhar
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Suspected Postgres Datacorruption
>
> > I ideally want to push to a higher version of Postgres . If I can
> prove that
> > there will be significant performance benefits and that crashes won't
> occur
> > then I will be able to present a strong case .
>
> Hehe.  It would be hard to NOT get significant performance
> improvements moving from 7.4 to 9.0.  Heck our load on our production
> servers went from 12 to 3 or so when we went from 8.1 to 8.3.  Saved
> us a ton on what we would have had to spend to keep 8.1 happy.
> Install a test version of 9.0 on a laptop, point your test servers at
> it, and watch it outrun your production database for 90% of everything
> you do.

At a previous engagement, when we moved from 7.4 to 8.1 we saw a huge drop in transaction times.  I don't remember the
numbersbut it was substantial.  We also suffered very badly from checkpoint problems with 7.4, and we were able to tune
themout in 8.1.  When we went from 8.1 to 8.3, there wasn't an improvement in response times but we were able to
deliverthe same level of performance using a fraction of the I/O (due to HOT, autovacuum improvements the checkpoint
smoothingstuff). 

We also ran 7.4 for quite a while (on reliable hardware), and never had any corruption problems except for some index
corruptionissues - but that bug was pretty obscure and was fixed in 7.4 

Brad.

Re: Suspected Postgres Datacorruption

From
"Nicholson, Brad (Toronto, ON, CA)"
Date:
> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-
> owner@postgresql.org] On Behalf Of Nicholson, Brad (Toronto, ON, CA)
> Sent: Thursday, August 04, 2011 5:47 PM
> To: Scott Marlowe; Sumeet Jauhar
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Suspected Postgres Datacorruption
>
>
> We also ran 7.4 for quite a while (on reliable hardware), and never had
> any corruption problems except for some index corruption issues - but
> that bug was pretty obscure and was fixed in 7.4

By the way - to the original person asking about 7.4 do not view this as an endorsement.  I would not trust my data to
7.4any longer. 

Brad.

Re: Suspected Postgres Datacorruption

From
Scott Marlowe
Date:
On Thu, Aug 4, 2011 at 8:33 PM, Sumeet Jauhar <sumeet.jauhar@gmail.com> wrote:
>
>    [ Sumeet ] ok so i agree we need to move ahead and shift to a higher
> version . But how do we decide that . Which one would you say is the
> stablest version of Postgres [ still supported version ] out in the market
> below beacuse Brad here  says his 8.1  version did have performance impacts
> .  Brad - How had you decide on the version . Was it the latest version
> available at that point in time or there was someother reason ? I am also
> pretty sure that upgrading 2 times would not have been easy .

I would upgrade to either 8.2 or 9.0 and here's my reasons.  with 8.2
you still have implicit casts, which your application may depend upon.
 Most other changes between 7.4 and 8.2 were pretty small, so if
you've got a lot of implicit casts in your SQL, 8.2 will be the least
painful of the upgrades to late model pgsqls.  HOWEVER, 8.2 is getting
pretty old now and performance wise 9.0 will pretty handily beat it.
In terms of stability, there are no reports of any versions after
about 8.1 or 8.2 being particularly unstable, but keep in mind that
support for 8.1 and 8.2 will be ending / may have ended already, so if
you can possibly test against 9.0 and see if it works well enough,
then you should really do so.  The changes to things like autovacuum
getting multi-threaded (8.3) HOT updates (8.3) on disk tracking of
free space map (8.4) and a few other big breakthroughs make going to
the latest (9.0) or near latest (8.4) much more attractive.  And trust
me, you WILL feel the difference in performance, it's huge from 7.4 to
8.3 and after that the incremental changes are noticeable, if not as
huge.

Re: Suspected Postgres Datacorruption

From
Scott Marlowe
Date:
On Thu, Aug 4, 2011 at 8:40 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> then you should really do so.  The changes to things like autovacuum
> getting multi-threaded (8.3) HOT updates (8.3) on disk tracking of

Wait, multithreaded autovac may have been put in place in 8.2 .
Anyway, my points still stand, just might be off a version here or
there.


--
To understand recursion, one must first understand recursion.

Re: Suspected Postgres Datacorruption

From
Tom Lane
Date:
Scott Marlowe <scott.marlowe@gmail.com> writes:
> I would upgrade to either 8.2 or 9.0 and here's my reasons.  with 8.2
> you still have implicit casts, which your application may depend upon.
>  Most other changes between 7.4 and 8.2 were pretty small, so if
> you've got a lot of implicit casts in your SQL, 8.2 will be the least
> painful of the upgrades to late model pgsqls.  HOWEVER, 8.2 is getting
> pretty old now and performance wise 9.0 will pretty handily beat it.
> In terms of stability, there are no reports of any versions after
> about 8.1 or 8.2 being particularly unstable, but keep in mind that
> support for 8.1 and 8.2 will be ending / may have ended already, so if
> you can possibly test against 9.0 and see if it works well enough,
> then you should really do so.

See:
http://wiki.postgresql.org/wiki/PostgreSQL_Release_Support_Policy
8.1 is dead already, 8.2 will go off life support this December.

So if you're getting involved in a major-version upgrade now, you
really owe it to yourself to jump to 8.4 or later.  IMO anyway.

(FWIW, I know of no reason to think that 8.4->9.0 is a bigger jump
than any other major-release bump from the application compatibility
standpoint.  Scott is correct to identify the removal of some implicit
casts-to-text in 8.3 as the single largest pain point we've introduced
in recent memory.  Personally I'm betting that this will be eclipsed
by the shift to standard_conforming_strings=on in 9.1 ...)

            regards, tom lane

Re: Suspected Postgres Datacorruption

From
Greg Smith
Date:
Sumeet Jauhar wrote:
>
> Our application is running on Postgres 7.4.X . I agree that this is a
> very old version of Postgres and we should have upgraded .
>

It's important to know the .X here.  The latest 7.4 is 7.4.30:
http://www.postgresql.org/docs/7.4/static/release.html

If you're running a 7.4 much lower than .30, you almost certainly have a
version with corruption bugs related to indexes.  There's a bunch of
them mentioned in the release notes of many 7.4 versions listed there.

> I ideally want to push to a higher version of Postgres . If I can
> prove that there will be significant performance benefits and that
> crashes won’t occur then I will be able to present a strong case .
>

Go visit http://suckit.blog.hu/2009/09/29/postgresql_history for minute.

8.0 is faster than the 7.4 you're running, and that's showing the speed
increase from there.  Your application might easily run 10X as fast on a
newer PostgreSQL version.

Now, on top of all this, it sounds like you might have a problem with
your drives/controller not doing writes reliably.  See
http://wiki.postgresql.org/wiki/Reliable_Writes for more information.
If that's the situation, the version of PostgreSQL you use won't matter
too much--the database will still be unreliable if the hardware is
configured to do the wrong thing.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


Re: Suspected Postgres Datacorruption

From
Sumeet Jauhar
Date:
Yes the very fact that we are using a very very old version of Postgres is certainly causing alot of problems .

On Fri, Aug 5, 2011 at 2:51 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Wed, Aug 3, 2011 at 1:35 AM, Sumeet Jauhar <sumeet.jauhar@gmail.com> wrote:
>
>
> Our application is running on Postgres 7.4.X . I agree that this is a very
> old version of Postgres and we should have upgraded . The issue that we
> faced is that

Wow, that is a very old version.  It has been out of maintenance for a
long time.  If there are data eating bugs in it they aren't gonna get
fixed.

[ Sumeet ] i plan to propose an upgrade soon . This data corruption issue seems to be the best push / driver for me to go ahead and implement it .
 
> 1 . There was a system crash due to a hardware failure .
>
> 2 . When the system came up , we tried to insert a few records into the
> database . However at this point in time we saw that Postgres was taking a
> lot of CPU & memory .
>
> Around 42% CPU consumption . This was a cause of concern .
>
> 3 . We re-indexed the database and it helped reduce the cpu & memory
> consumption .
>
> My question is
>
> A ) Isn’t Postgres database resilient enough to handle hardware system
> failure ? or it sometime results in a corrupt index for its tables ? I read
> on the Postgres site that hardware failure can cause corrupt indexes .
> Besides this are there any other scenario which may result in such
> corruption .

Depends on the hardware failure.  If your RAID controller starts
writing garbage to the drive array, how exactly should postgresql fix
that?  OTOH, if you just have a big boom and the power supply goes
out, most the time you're fine.  Of course, if the drive subsystem is
lying about fsyncs, then postgresql can't guarantee your data anyway.
So, it really depends on your hardware.  Standard test to make sure
your hardware is ok is to install postgresql, start a lot of
transactions at once, and walk around back and pull the power plug.
If it comes back up a half dozen times without errors you're probably
ok, but hey, there could still be a corner case out there too.  Bonus
points if you initiate checkpoint that'll take a few minutes before
you pull the plug, increasing the chance you'll find problems.

With 7.4 there's a real likelihood that there are data loss bugs in
there that have never been fixed and never will be.
  
   [ Sumeet ] The scenario that you have pointed out . ie to go back and unplug the powersupply while there are database operations going on seems a good test case . I will do that and see what possibly happens . A faulty RAID on the system is bound to cause problems . I agree . It will manifest itself in someway . 

> B) If there has been improvement / enhancements done by Postgres regarding
> the way it handles corrupt indexes can you please pass me more information
> about the bug Id or some documentation on it ? Our application does not do
> any REINDEXING . I am in a dilemma if we should seriously incorporate it in
> our application .

Of course, there's been lots of improvements since 7.4  But being a
database when it encounters errors it tries not to guess too much
about what you want.  IS a reindex the right thing to do?  Maybe,
maybe not.  That's the job of the DBA to figure out.  Regular
reindexing is not needed and if your particular machine does need it
you need to figure out why and change it so that it's not needed.  If
indexes are getting corrupted, chances are so are tables and you'll
notice too late.

   [ Sumee ] Thanks . i was of the opinion that re-indexing could be incorporated as a precautionary thing everytime the system crashes . However the hard part is to do it only when the system crashes . and the harder part is to know that the system has actually crashed and its not a simple reboot .
DBA should help me . WIll do that .
 
> I ideally want to push to a higher version of Postgres . If I can prove that
> there will be significant performance benefits and that crashes won’t occur
> then I will be able to present a strong case .

Hehe.  It would be hard to NOT get significant performance
improvements moving from 7.4 to 9.0.  Heck our load on our production
servers went from 12 to 3 or so when we went from 8.1 to 8.3.  Saved
us a ton on what we would have had to spend to keep 8.1 happy.
Install a test version of 9.0 on a laptop, point your test servers at
it, and watch it outrun your production database for 90% of everything
you do.

We run 8.3 and 8.4 in production and they are literally light years
ahead of 7.4 in terms of stability, performance, and capabilities.
Plus when you find a problem in one of them, it gets fixed, fast.
They're still supported.  Just that would be enough to justify an
upgrade for me.
 
   [ Sumeet ] ok so i agree we need to move ahead and shift to a higher version . But how do we decide that . Which one would you say is the stablest version of Postgres [ still supported version ] out in the market below beacuse Brad here  says his 8.1  version did have performance impacts .  Brad - How had you decide on the version . Was it the latest version available at that point in time or there was someother reason ? I am also pretty sure that upgrading 2 times would not have been easy .

Re: Suspected Postgres Datacorruption

From
Sumeet Jauhar
Date:
Thank you . Scott and Brad . Valuable information for sure . I plan to
browse through the documentation for Postgres 9 and identify all the
potential advantages that it will bring to our application . As
rightly pointed out 8.2 may be on the path to obsolescence .

On Friday, August 5, 2011, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Thu, Aug 4, 2011 at 8:40 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>> then you should really do so.  The changes to things like autovacuum
>> getting multi-threaded (8.3) HOT updates (8.3) on disk tracking of
>
> Wait, multithreaded autovac may have been put in place in 8.2 .
> Anyway, my points still stand, just might be off a version here or
> there.
>
>
> --
> To understand recursion, one must first understand recursion.
>