Re: ECC RAM really needed? - Mailing list pgsql-performance
From | mark@mark.mielke.cc |
---|---|
Subject | Re: ECC RAM really needed? |
Date | |
Msg-id | 20070526145214.GA21290@mark.mielke.cc Whole thread Raw |
In response to | Re: ECC RAM really needed? (Michael Stone <mstone+postgres@mathom.us>) |
Responses |
Re: ECC RAM really needed?
|
List | pgsql-performance |
On Sat, May 26, 2007 at 08:43:15AM -0400, Michael Stone wrote: > On Fri, May 25, 2007 at 06:45:15PM -0700, Craig James wrote: > >We're thinking of building some new servers. We bought some a while back > >that have ECC (error correcting) RAM, which is absurdly expensive compared > >to the same amount of non-ECC RAM. Does anyone have any real-life data > >about the error rate of non-ECC RAM, and whether it matters or not? In my > >long career, I've never once had a computer that corrupted memory, or at > >least I never knew if it did. > ...because ECC RAM will correct single bit errors. FWIW, I've seen *a > lot* of single bit errors over the years. Some systems are much better > about reporting than others, but any system will have occasional errors. > Also, if a stick starts to go bad you'll generally be told about with > ECC memory, rather than having the system just start to flake out. First: I would use ECC RAM for a server. The memory is not significantly more expensive. Now that this is out of the way - I found this thread interesting because although it talked about RAM bit errors, I haven't seen reference to the significance of RAM bit errors. Quite a bit of memory is only rarely used (sent out to swap or flushed before it is accessed), or used in a read-only capacity in a limited form. For example, if searching table rows - as long as the row is not selected, and the bit error is in a field that isn't involved in the selection criteria, who cares if it is wrong? So, the question then becomes, what percentage of memory is required to be correct all of the time? I believe the estimates for bit error are high estimates with regard to actual effect. Stating that a bit may be wrong once every two weeks does not describe effect. In my opinion, software defects have a similar estimate for potential for damage to occur. In the last 10 years - the only problems with memory I have ever successfully diagnosed were with cheap hardware running in a poor environment, where the problem became quickly obvious, to the point that the system would be unusable or the BIOS would refuse to boot with the broken memory stick. (This paragraph represents the primary state of many of my father's machines :-) ) Replacing the memory stick made the problems go away. In any case - the word 'cheap' is significant in the above paragraph. non-ECC RAM should be considered 'cheap' memory. It will work fine most of the time and most people will never notice a problem. Do you want to be the one person who does notice a problem? :-) Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/
pgsql-performance by date: