On 10/01/12 09:07, Simon Riggs wrote:
> > You can repeat that argument ad infinitum. Even if the CRC covers all the
> > pages in the OS buffer cache, it still doesn't cover the pages in the
> > shared_buffers, CPU caches, in-transit from one memory bank to another etc.
> > You have to draw the line somewhere, and it seems reasonable to draw it
> > where the data moves between long-term storage, ie. disk, and RAM.
>
> We protect each change with a CRC when we write WAL, so doing the same
> thing doesn't sound entirely unreasonable, especially if your database
> fits in RAM and we aren't likely to be doing I/O anytime soon. The
> long term storage argument may no longer apply in a world with very
> large memory.
>
I'm not so sure about that. The experience we have is that storage
and memory doesn't grow as fast as demand. Maybe we are in a minority
but at Jane Street memory size < database size is sadly true for most
of the important databases.
Concrete the two most important databases are
715 GB
and
473 GB
in size (the second used to be much closer to the first one in size but
we recently archived a lot of data).
In both databases there is a small set of tables that use the majority of
the disk space. Those tables are also the most used tables. Typically
the size of one of those tables is between 1-3x size of memory. And the
cumulative size of all indices on the table is normally roughly the same
size as the table.
Cheers,
Bene