Re: Suspected Postgres Datacorruption - Mailing list pgsql-performance

From Sumeet Jauhar
Subject Re: Suspected Postgres Datacorruption
Date
Msg-id CAKN1XtroO6PnP6YjB4MLrYmLMKSZS+WuxMvmS9zdSCqdE06=wA@mail.gmail.com
Whole thread Raw
In response to Re: Suspected Postgres Datacorruption  (Scott Marlowe <scott.marlowe@gmail.com>)
List pgsql-performance
Yes the very fact that we are using a very very old version of Postgres is certainly causing alot of problems .

On Fri, Aug 5, 2011 at 2:51 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Wed, Aug 3, 2011 at 1:35 AM, Sumeet Jauhar <sumeet.jauhar@gmail.com> wrote:
>
>
> Our application is running on Postgres 7.4.X . I agree that this is a very
> old version of Postgres and we should have upgraded . The issue that we
> faced is that

Wow, that is a very old version.  It has been out of maintenance for a
long time.  If there are data eating bugs in it they aren't gonna get
fixed.

[ Sumeet ] i plan to propose an upgrade soon . This data corruption issue seems to be the best push / driver for me to go ahead and implement it .
 
> 1 . There was a system crash due to a hardware failure .
>
> 2 . When the system came up , we tried to insert a few records into the
> database . However at this point in time we saw that Postgres was taking a
> lot of CPU & memory .
>
> Around 42% CPU consumption . This was a cause of concern .
>
> 3 . We re-indexed the database and it helped reduce the cpu & memory
> consumption .
>
> My question is
>
> A ) Isn’t Postgres database resilient enough to handle hardware system
> failure ? or it sometime results in a corrupt index for its tables ? I read
> on the Postgres site that hardware failure can cause corrupt indexes .
> Besides this are there any other scenario which may result in such
> corruption .

Depends on the hardware failure.  If your RAID controller starts
writing garbage to the drive array, how exactly should postgresql fix
that?  OTOH, if you just have a big boom and the power supply goes
out, most the time you're fine.  Of course, if the drive subsystem is
lying about fsyncs, then postgresql can't guarantee your data anyway.
So, it really depends on your hardware.  Standard test to make sure
your hardware is ok is to install postgresql, start a lot of
transactions at once, and walk around back and pull the power plug.
If it comes back up a half dozen times without errors you're probably
ok, but hey, there could still be a corner case out there too.  Bonus
points if you initiate checkpoint that'll take a few minutes before
you pull the plug, increasing the chance you'll find problems.

With 7.4 there's a real likelihood that there are data loss bugs in
there that have never been fixed and never will be.
  
   [ Sumeet ] The scenario that you have pointed out . ie to go back and unplug the powersupply while there are database operations going on seems a good test case . I will do that and see what possibly happens . A faulty RAID on the system is bound to cause problems . I agree . It will manifest itself in someway . 

> B) If there has been improvement / enhancements done by Postgres regarding
> the way it handles corrupt indexes can you please pass me more information
> about the bug Id or some documentation on it ? Our application does not do
> any REINDEXING . I am in a dilemma if we should seriously incorporate it in
> our application .

Of course, there's been lots of improvements since 7.4  But being a
database when it encounters errors it tries not to guess too much
about what you want.  IS a reindex the right thing to do?  Maybe,
maybe not.  That's the job of the DBA to figure out.  Regular
reindexing is not needed and if your particular machine does need it
you need to figure out why and change it so that it's not needed.  If
indexes are getting corrupted, chances are so are tables and you'll
notice too late.

   [ Sumee ] Thanks . i was of the opinion that re-indexing could be incorporated as a precautionary thing everytime the system crashes . However the hard part is to do it only when the system crashes . and the harder part is to know that the system has actually crashed and its not a simple reboot .
DBA should help me . WIll do that .
 
> I ideally want to push to a higher version of Postgres . If I can prove that
> there will be significant performance benefits and that crashes won’t occur
> then I will be able to present a strong case .

Hehe.  It would be hard to NOT get significant performance
improvements moving from 7.4 to 9.0.  Heck our load on our production
servers went from 12 to 3 or so when we went from 8.1 to 8.3.  Saved
us a ton on what we would have had to spend to keep 8.1 happy.
Install a test version of 9.0 on a laptop, point your test servers at
it, and watch it outrun your production database for 90% of everything
you do.

We run 8.3 and 8.4 in production and they are literally light years
ahead of 7.4 in terms of stability, performance, and capabilities.
Plus when you find a problem in one of them, it gets fixed, fast.
They're still supported.  Just that would be enough to justify an
upgrade for me.
 
   [ Sumeet ] ok so i agree we need to move ahead and shift to a higher version . But how do we decide that . Which one would you say is the stablest version of Postgres [ still supported version ] out in the market below beacuse Brad here  says his 8.1  version did have performance impacts .  Brad - How had you decide on the version . Was it the latest version available at that point in time or there was someother reason ? I am also pretty sure that upgrading 2 times would not have been easy .

pgsql-performance by date:

Previous
From: "mark"
Date:
Subject: Re: benchmark woes and XFS options
Next
From: Sumeet Jauhar
Date:
Subject: Re: Suspected Postgres Datacorruption