Home > mailing lists

Re: crash-safe visibility map, take four - Mailing list pgsql-hackers

From	高增琦
Subject	Re: crash-safe visibility map, take four
Date	March 31, 2011 05:34:07
Msg-id	AANLkTin+hWh8QE83XjN9J1br4Qn7_qYwQY-vGWA-nduQ@mail.gmail.com Whole thread
In response to	Re: crash-safe visibility map, take four (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses	Re: crash-safe visibility map, take four
List	pgsql-hackers

Tree view

On Wed, Mar 30, 2011 at 8:52 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

On 30.03.2011 06:24, 高增琦 wrote:
Should we do full-page write for visibilitymap all the time?
Now, when clear visiblitymap, there is no full-page write for vm
since we don't save buffer info in insert/update/delete's log.

The full-page write is used to protect pages from disk failure. Without it,
1) set vm: the vm bits that should be set to 1 may still be 0
2) clear vm: the vm bits that should be set to 0 may still be 1
Are these true? Or the page is totally unpredictable?

Not quite. The WAL replay will set or clear vm bits, regardless of full page writes. Full page writes protect from torn pages, ie. the problem where some operations on a page have made it to disk while others have not. That's not a problem for VM pages, as each bit on the page can be set or cleared individually. But for something like a heap page where you have an offset in the beginning of the page that points to the tuple elsewhere on the page, you have to ensure that they stay in sync, even if you don't otherwise care if the update makes it to disk or not.

Consider a example:
1. delete on two pages, emits two log (1, page1, vm_clear_1), (2, page2, vm_clear_2)
2. "vm_clear_1" and "vm_clear_2" on same vm page
3. checkpoint, and vm page get torned, vm_clear_2 was lost
4. delete another page, emits one log (3, page1, vm_clear_3), vm_clear_3 still on that vm page
5. power down
6. startup, redo will replay all change after checkpoint, but vm_clear_2 will never be cleared
Am I right?

Another question:
To address the problem in
http://archives.postgresql.org/pgsql-hackers/2010-02/msg02097.php
, should we just clear the vm before the log of insert/update/delete?
This may reduce the performance, is there another solution?

Yeah, that's a straightforward way to fix it. I don't think the performance hit will be too bad. But we need to be careful not to hold locks while doing I/O, which might require some rearrangement of the code. We might want to do a similar dance that we do in vacuum, and call visibilitymap_pin first, then lock and update the heap page, and then set the VM bit while holding the lock on the heap page.

Do you mean we should lock the heap page first, then get the blocknumber, then release heap page,
then pin the vm's page, then lock both heap page and vm page?
As Robert Haas said, when lock the heap page again, may there isnot enough free space on it.
Is there a way just stop the checkpoint for a while?

Thanks.
GaoZengqi

pgsql-hackers by date:

From: Heikki Linnakangas
Date: 31 March 2011, 04:21:00
Subject: Re: Replication server timeout patch

From: 高增琦
Date: 31 March 2011, 05:46:41
Subject: Re: crash-safe visibility map, take four

Re: crash-safe visibility map, take four - Mailing list pgsql-hackers

Previous

Next