Re: Hot Standby and VACUUM FULL - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Hot Standby and VACUUM FULL
Date
Msg-id 20417.1265039112@sss.pgh.pa.us
Whole thread Raw
In response to Re: Hot Standby and VACUUM FULL  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: Hot Standby and VACUUM FULL  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Tom Lane wrote:
>> The other problem with what you sketch is that it'd require holding the
>> mapfile write lock across commit, because we still have to have strict
>> serialization of updates.

> Why is the strict serialization of updates needed? To avoid overwriting
> the file with stale contents in a race condition?

Exactly.

> I was thinking that we only store the modified part in the WAL record.
> Right after writing commit record, take the lock, read() the map file,
> modify it in memory, write() it back, and release lock.

> That means that there's no full images of the file in WAL records, which
> makes me slightly uncomfortable from a disaster recovery point-of-view,

Yeah, me too, which is another advantage of using a separate WAL entry.

> That doesn't solve the problem I was trying to solve, which is that if
> the map file is rewritten, but the transaction later aborts, the map
> file update has hit the disk already. That's why I wanted to stash it
> into the commit record.

The design I sketched doesn't require such an assumption anyway.  Once
the map file is written, the relocation is effective, commit or no.
As long as we restrict relocations to maintenance operations such as
VACUUM FULL, which have no transactionally significant results, this
doesn't seem like a problem.  What we do need is that after a CLUSTER
or V.F., which is going to relocate not only the rel but its indexes,
the relocations of the rel and its indexes have to all "commit"
atomically.  But saving up the transaction's map changes and applying
them in one write takes care of that.

(What I believe this means is that pg_class and its indexes have to all
be mapped, but I'm thinking right now that no other non-shared catalogs
need the treatment.)
        regards, tom lane


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Review: listagg aggregate
Next
From: Robert Haas
Date:
Subject: Re: Package namespace and Safe init cleanup for plperl UPDATE 3 [PATCH]