Re: Page Checksums - Mailing list pgsql-hackers

From Benedikt Grundmann
Subject Re: Page Checksums
Date
Msg-id 20120110092542.GJ6419@ldn-qws-004.delacy.com
Whole thread Raw
In response to Re: Page Checksums  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On 10/01/12 09:07, Simon Riggs wrote:
> > You can repeat that argument ad infinitum. Even if the CRC covers all the
> > pages in the OS buffer cache, it still doesn't cover the pages in the
> > shared_buffers, CPU caches, in-transit from one memory bank to another etc.
> > You have to draw the line somewhere, and it seems reasonable to draw it
> > where the data moves between long-term storage, ie. disk, and RAM.
> 
> We protect each change with a CRC when we write WAL, so doing the same
> thing doesn't sound entirely unreasonable, especially if your database
> fits in RAM and we aren't likely to be doing I/O anytime soon. The
> long term storage argument may no longer apply in a world with very
> large memory.
> 
I'm not so sure about that.  The experience we have is that storage
and memory doesn't grow as fast as demand.  Maybe we are in a minority 
but at Jane Street memory size < database size is sadly true for most 
of the important databases.

Concrete the two most important databases are 

715 GB

and

473 GB 

in size (the second used to be much closer to the first one in size but
we recently archived a lot of data).

In both databases there is a small set of tables that use the majority of
the disk space.  Those tables are also the most used tables.  Typically
the size of one of those tables is between 1-3x size of memory.  And the
cumulative size of all indices on the table is normally roughly the same
size as the table.

Cheers,

Bene


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: LWLOCK_STATS
Next
From: Heikki Linnakangas
Date:
Subject: Re: [PATCH] Allow breaking out of hung connection attempts