On Tue, Aug 5, 2014 at 8:04 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> To decide whether we need to re-copy the file, you read the file until
> we find a block with a later LSN. If we read the whole file without
> finding a later LSN then we don't need to re-copy. That means we read
> each file twice, which is slower, but the file is at most 1GB in size,
> we we can assume will be mostly in memory for the second read.
That seems reasonable, although copying only the changed blocks
doesn't seem like it would be a whole lot harder. Yes, you'd need a
tool to copy those blocks back into the places where they need to go,
but that's probably not a lot of work and the disk savings, in many
cases, would be enormous.
> As Marco says, that can be optimized using filesystem timestamps instead.
The idea of using filesystem timestamps gives me the creeps. Those
aren't always very granular, and I don't know that (for example) they
are crash-safe. Does every filesystem on every platform make sure
that the mtime update hits the disk before the data? What about clock
changes made manually by users, or automatically by ntpd? I recognize
that there are people doing this today, because it's what we have, and
it must not suck too much, because people are still doing it ... but I
worry that if we do it this way, we'll end up with people saying
"PostgreSQL corrupted my data" and will have no way of tracking the
problem back to the filesystem or system clock event that was the true
cause of the problem, so they'll just blame the database.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company