Greg Stark wrote:
> I think double buffering solves the torn page problem but not the lack
> of wal logging. Alvarro solved the wal logging by deferring the wal
> logs. But I'm not sure how confident we are that it's logging enough.
>
Right now, it's WAL-logging HeapTupleHeader hint bits (infomask and
infomask2), and ItemId (line pointer) flags. Page pd_flags are skipped
in the CRC checksum -- this is easy to do because they are in a constant
offset in the page and I'm just skipping those bytes in CRC_COMP().
So what I'm missing is:
- btree hint bits
- bgwriter calls XLogInsert during shutdown, to WAL-log the hint bits
of unwritten pages. This causes a PANIC to trigger about concurrent WAL
activity during checkpoint. (The easy solution to this problem is just
to remove the check; another idea is to flush the buffers before
grabbing the final address to watch for at shutdown.)
> I'm beginning to think just excluding the hint bits would be simpler and
> safer. If we're double buffering then it might be possible to do that
> pretty cheaply. Copy the whole buffer with memcpy then loop through the
> line pointers unsetting the hint bits. Then do the crc. Though that would
> prevent us from doing "zero-copy" crc by doing it in the copy.
This can probably be made to work, and it solves the problem that
bgwriter calls XLogInsert during shutdown. I would create new routines
to clear hint bits in all involved modules (heap_resethintbits, btree_%,
item_%, page_%), and call them on a copy of the page.
The downside to this idea is that we need to create a copy of the page
and call those routines when we read the page in, too.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support