Thread: AW: Vacuum only with 20% old tuples
> > It does not postpone anything. WAL only logs what it does: > > > > 1. log i now start to rename file > > 2. rename file > > How do we log *rename* ? No, step 2 was meant to do the actual rename. You log before and after the actual rename. > What I've meant by *rename* is to replace an existent table > file by a (e.g. work temp) file. So the old table file would > vanish by renaming. you need to rename the old file first then the new file > How to save the content for rollback ? with rename of old file > Is it preferable to save the content entirely to WAL log file ? no, unless you have log space to spare, but not enough data space, which is imho unlikely (or to be considered later). > If *rename* is possible,are OID filenames necessary in the > first place ? no > > > 3. log rename successful or abort txn > > > > > > > > > No there's a significant difference between the failure > of 'delete' > > > and that of 'rename'. We would have no consistency problem even > > > though 'delete' fails and wouldn't have to stop postmaster. But we > > > wouldn't be able to see the renamed relation in case of 'rename' > > > failure and an excellent(??) dba would have to recover the > > > inconsistency. > > > > The dba only fixes the underlying problem, like filesystem > > mounted readonly > > or wrong permissions on directory. He then simply starts > the postmaster > > again, > > the rollforward with rename will then succeed. > > > > Mustn't a dba be there to restart the postmaster when *rename* > fails ? We don't need a dba even when *delete* fails after commit. The dba needs to clean up unused file space later, which leaves the problem for the dba to decide which file is needed and what not (quite difficult with an oid filename). But, rename is only supposed to fail if something is wrong with the filesystem or directory. There will be a lot of other problems, like creating sort files etc if that is the case. If a rename fails under normal operation the current transaction is aborted, but you are correct that there is a problem if previous renames in the same transaction succeeded, but all further renames (for rollback) fail (is this likely ?). In that unlikely case I would bail out, let the dba fix the problem and fix the db state at startup rollforward (which can now rename and thus abort the txn). Andreas
> -----Original Message----- > From: Zeugswetter Andreas SB [mailto:ZeugswetterA@wien.spardat.at] > > > > It does not postpone anything. WAL only logs what it does: > > > > > > 1. log i now start to rename file > > > 2. rename file > > > > How do we log *rename* ? > > No, step 2 was meant to do the actual rename. > You log before and after the actual rename. > > > What I've meant by *rename* is to replace an existent table > > file by a (e.g. work temp) file. So the old table file would > > vanish by renaming. > > you need to rename the old file first then the new file > > > How to save the content for rollback ? > > with rename of old file > OK,I see. Your idea seems to be close to Vadim's *relink*. However it isn't clear if *relink* has no problem. In my test case on cygwin port ,*relink* doesn't work well. *relink* caused an error if concurrent backends open the *relink*ing table. > > > > > > > > > > No there's a significant difference between the failure > > of 'delete' > > > > and that of 'rename'. We would have no consistency problem even > > > > though 'delete' fails and wouldn't have to stop postmaster. But we > > > > wouldn't be able to see the renamed relation in case of 'rename' > > > > failure and an excellent(??) dba would have to recover the > > > > inconsistency. > > > > > > The dba only fixes the underlying problem, like filesystem > > > mounted readonly > > > or wrong permissions on directory. He then simply starts > > the postmaster > > > again, > > > the rollforward with rename will then succeed. > > > > > > > Mustn't a dba be there to restart the postmaster when *rename* > > fails ? We don't need a dba even when *delete* fails after commit. > > The dba needs to clean up unused file space later, > which leaves the problem for the dba to decide which file is needed > and what not (quite difficult with an oid filename). > Yes,I don't want a dba to deal with such an unrelible recovery. Probably vacuum must be wise enough to remove unremoved table files before removing dead pg_class tuples. Regards. Hiroshi Inoue Inoue@tpf.co.jp
Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes: >>>> It does not postpone anything. WAL only logs what it does: >>>> >>>> 1. log i now start to rename file >>>> 2. rename file >> >> How do we log *rename* ? > No, step 2 was meant to do the actual rename. > You log before and after the actual rename. The trouble with depending on WAL to handle undo of external operations (ie, filesystem calls as opposed to DB record updates) is that you don't know all the facts if you have to recover/undo the operation. Consider the following not-too-implausible example: begin transaction;alter table foo rename to bar;create table foo ...;delete table bar;<crash> If we try to implement this by log entries, we will have log entries for rename, create, delete. We scan the log and come upon the rename. Hmm, there's a file foo and no file bar ... looks like the rename didn't get done, so do it. Ooops. Related nasty scenarios arise when there's a file there of the right name but the wrong contents, created by non-database operations. (Consider the risks involved with a table named "core", for example.) An even more serious problem is that you can't guarantee to be able to either repeat or undo an external operation, because there are factors not under your control (kernel permission checks, existing files getting in the way, etc). Sure, failures like that are not very likely, but that'll be cold comfort to someone whose database gets corrupted because a file rename needs to be undone and can't be. >> If *rename* is possible,are OID filenames necessary in the >> first place ? > no I think that OID filenames will be a far more reliable solution to the alter-table-rename problem than anything that involves trying to actually rename the files. There are other benefits too, like not being tied to filesystem-based naming restrictions for tables. regards, tom lane