Thread: AW: Vacuum only with 20% old tuples

AW: Vacuum only with 20% old tuples

From
Zeugswetter Andreas SB
Date:
> > It does not postpone anything. WAL only logs what it does:
> > 
> > 1. log i now start to rename file
> > 2. rename file
> 
> How do we log *rename* ?

No, step 2 was meant to do the actual rename.
You log before and after the actual rename.

> What I've meant by *rename* is to replace an existent table
> file by a (e.g. work temp) file.  So the old table file would
> vanish by renaming. 

you need to rename the old file first then the new file

> How to save the content for rollback ?

with rename of old file

> Is it preferable to save the content entirely to WAL log file ?

no, unless you have log space to spare, but not enough data space,
which is imho unlikely (or to be considered later).

> If *rename* is possible,are OID filenames necessary in the
> first place ? 

no

> 
> > 3. log rename successful or abort txn
> >
>  
> > > 
> > > No there's a significant difference between the failure 
> of  'delete'
> > > and that of 'rename'.  We would have no consistency problem even
> > > though 'delete' fails and wouldn't have to stop postmaster. But we
> > > wouldn't be able to see the renamed relation in case of 'rename'
> > > failure and an excellent(??) dba would have to recover the 
> > > inconsistency.
> > 
> > The dba only fixes the underlying problem, like filesystem 
> > mounted readonly 
> > or wrong permissions on directory. He then simply starts 
> the postmaster
> > again,
> > the rollforward with rename will then succeed.
> >
> 
> Mustn't a dba be there to restart the postmaster when *rename*
> fails ? We don't need a dba even when *delete* fails after commit.

The dba needs to clean up unused file space later, 
which leaves the problem for the dba to decide which file is needed 
and what not (quite difficult with an oid filename).

But, rename is only supposed to fail if something is wrong with the
filesystem or directory. There will be a lot of other problems,
like creating sort files etc if that is the case. 
If a rename fails under normal operation the current transaction is aborted,
but you are correct that there is a problem if previous renames in the same 
transaction succeeded, but all further renames (for rollback) fail (is this
likely ?).
In that unlikely case I would bail out, let the dba fix the problem and fix
the db
state at startup rollforward (which can now rename and thus abort the txn). 

Andreas


RE: Vacuum only with 20% old tuples

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Zeugswetter Andreas SB [mailto:ZeugswetterA@wien.spardat.at]
> 
> > > It does not postpone anything. WAL only logs what it does:
> > > 
> > > 1. log i now start to rename file
> > > 2. rename file
> > 
> > How do we log *rename* ?
> 
> No, step 2 was meant to do the actual rename.
> You log before and after the actual rename.
> 
> > What I've meant by *rename* is to replace an existent table
> > file by a (e.g. work temp) file.  So the old table file would
> > vanish by renaming. 
> 
> you need to rename the old file first then the new file
> 
> > How to save the content for rollback ?
> 
> with rename of old file
>

OK,I see. Your idea seems to be close to Vadim's *relink*. 
However it isn't clear if *relink* has no problem.
In my test case on cygwin port ,*relink* doesn't work well.
*relink* caused an error if concurrent backends open the
*relink*ing table. 
> >  
> > > > 
> > > > No there's a significant difference between the failure 
> > of  'delete'
> > > > and that of 'rename'.  We would have no consistency problem even
> > > > though 'delete' fails and wouldn't have to stop postmaster. But we
> > > > wouldn't be able to see the renamed relation in case of 'rename'
> > > > failure and an excellent(??) dba would have to recover the 
> > > > inconsistency.
> > > 
> > > The dba only fixes the underlying problem, like filesystem 
> > > mounted readonly 
> > > or wrong permissions on directory. He then simply starts 
> > the postmaster
> > > again,
> > > the rollforward with rename will then succeed.
> > >
> > 
> > Mustn't a dba be there to restart the postmaster when *rename*
> > fails ? We don't need a dba even when *delete* fails after commit.
> 
> The dba needs to clean up unused file space later,
> which leaves the problem for the dba to decide which file is needed 
> and what not (quite difficult with an oid filename).
>

Yes,I don't want a dba to deal with such an unrelible recovery.
Probably vacuum must be wise enough to remove unremoved
table files before removing dead pg_class tuples.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp


Re: AW: Vacuum only with 20% old tuples

From
Tom Lane
Date:
Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:
>>>> It does not postpone anything. WAL only logs what it does:
>>>> 
>>>> 1. log i now start to rename file
>>>> 2. rename file
>> 
>> How do we log *rename* ?

> No, step 2 was meant to do the actual rename.
> You log before and after the actual rename.

The trouble with depending on WAL to handle undo of external operations
(ie, filesystem calls as opposed to DB record updates) is that you don't
know all the facts if you have to recover/undo the operation.  Consider
the following not-too-implausible example:
begin transaction;alter table foo rename to bar;create table foo ...;delete table bar;<crash>

If we try to implement this by log entries, we will have log entries
for rename, create, delete.  We scan the log and come upon the rename.
Hmm, there's a file foo and no file bar ... looks like the rename didn't
get done, so do it.  Ooops.

Related nasty scenarios arise when there's a file there of the right
name but the wrong contents, created by non-database operations.
(Consider the risks involved with a table named "core", for example.)

An even more serious problem is that you can't guarantee to be able
to either repeat or undo an external operation, because there are
factors not under your control (kernel permission checks, existing
files getting in the way, etc).  Sure, failures like that are not
very likely, but that'll be cold comfort to someone whose database
gets corrupted because a file rename needs to be undone and can't be.

>> If *rename* is possible,are OID filenames necessary in the
>> first place ? 

> no

I think that OID filenames will be a far more reliable solution to
the alter-table-rename problem than anything that involves trying
to actually rename the files.  There are other benefits too, like
not being tied to filesystem-based naming restrictions for tables.
        regards, tom lane