On Mon, 2007-06-25 at 19:06 +0900, Koichi Suzuki wrote:
> Year, I agree we should carefully follow how Done really did a backup.
> My point is PostgreSQL may have to extend the file during the hot backup
> to write to the new block.
If the snapshot is a consistent, point-in-time copy then I don't see how
any I/O at all makes a difference. To my knowledge, both EMC and NetApp
produce snapshots like this. IIRC, EMC calls these instant snapshots,
NetApp calls them frozen snapshots.
> It is slightly different from Oracle's case.
> Oracle allocates all the database space in advance so that there could
> be no risk to modify the metadata on the fly.
Not really sure its different.
Oracle allows dynamic file extensions and I've got no evidence that file
extension is prevented from occurring during backup simply as a result
of issuing the start hot backup command.
Oracle and DB2 both support a stop-I/O-to-the-database mode. My
understanding is that isn't required any more if you do an instant
snapshot, so if people are using instant snapshots it should certainly
be the case that they are safe to do this with PostgreSQL also.
Oracle is certainly more picky about snapshotted files than PostgreSQL
is. In Oracle, each file has a header with the LSN of the last
checkpoint in it. This is used at recovery time to ensure the backup is
consistent by having exactly equal LSNs across all files. PostgreSQL
doesn't use file headers and we don't store the LSN on a per-file basis,
though we do store the LSN in the control file for the whole server.
> In our case, because SAN
> based storage snapshot is device level, not file system level, even a
> file system does not know that the snapshot is being taken and we might
> encounter the case where metadata and/or user data are not consistent.
> Such snapshot (whole filesystem) might be corrupted and cause file
> system level error.
>
> I'm interested in this. Any further comment/openion is welcome.
If you can show me either
i) an error that occurs after the full and correct PostgreSQL hot backup
procedures have been executed, or
ii) present a conjecture that explains in detail how a device level
error might occur
then I will look into this further.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com