Re: Add checksums without --initdb - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Add checksums without --initdb
Date
Msg-id 20150702202820.GJ16267@alap3.anarazel.de
Whole thread Raw
In response to Re: Add checksums without --initdb  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Add checksums without --initdb
List pgsql-hackers
On 2015-07-02 22:53:40 +0300, Heikki Linnakangas wrote:
> On 07/02/2015 10:39 PM, David Christensen wrote:
> >Possible concerns here are whether checksums are included in WAL
> >full_page_writes or if they are independently calculated; if the
> >latter I think we’d be fine.  If checksums are all handled at the
> >layer below WAL than any streamed/processed changes should be fine to
> >get us to the point where we could come up as a master.
> 
> It's not full_page_writes that's the problem, but the server would not
> WAL-log hint bit updates, unless you also have wal_log_hints enabled. But
> that would be simple to just check - wal_log_hints can be enabled with a
> server restart so that's not too onerous.

Hm. Since hint bits writes in that case aren't transported to the
standby I don't see the problem right now? The effect of a hint bit
write on the primary won't have an effect on the standby. And on
standbys themselves we "solve" that problem by not dirtying the buffer
on hint bit writes...

> >Andres suggested a separate tool that would basically rewrite the
> >existing data directory heap files in place, which I can also see a
> >use case for, but I also think there’s some benefit to be found in
> >having it happen while the replica is being streamed/built.
> >
> >Ideas/thoughts/reasons this wouldn’t work?

We don't have, afaik, an easy way to know what kind of page format
individual relations in the base backup are going to have. I think
you'll need access for the catalog for that.

Now, we don't have many non-standard pages that aren't recognizeable by
their fork. So you could possibly somehow get away with it. But I don't
think that'll be very robust.

> Add a "enabling-checksums" mode to the server where it calculates checksums
> for anything it writes, but doesn't check or complain about incorrect
> checksums on reads. Put the server into that mode, and then have a
> background process that reads through all data in the cluster, calculates
> the checksum for every page, and writes all the data back. Once that's
> completed, checksums can be fully enabled.

You'd need, afaics, a bgworker that connects to every database to read
pg_class, to figure out what type of page a relfilenode has. And this
probably should call back into the relevant AM or such.

Generally having an infrastructure to do this would be very, very good:
It'll allow us to clean up old data in the background. We could much
more realistically reclaim infomask fields etc.

Sounds like it rather conceivably could be integrated into
autovacuum. The launcher recognizes that a database isn't yet in the new
format/checksum state, launches a worker, etc.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Information of pg_stat_ssl visible to all users
Next
From: Oskari Saarenmaa
Date:
Subject: Re: Configurable location for extension .control files