Home > mailing lists

Re: crash-safe visibility map, take four - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: crash-safe visibility map, take four
Date	March 31, 2011 07:31:41
Msg-id	4D9457FC.6010700@enterprisedb.com Whole thread Raw
In response to	Re: crash-safe visibility map, take four (高增琦 <pgf00a@gmail.com>)
List	pgsql-hackers

Tree view

On 31.03.2011 11:33, 高增琦 wrote:
> Consider a example:
> 1. delete on two pages, emits two log (1, page1, vm_clear_1), (2, page2,
> vm_clear_2)
> 2. "vm_clear_1" and "vm_clear_2" on same vm page
> 3. checkpoint, and vm page get torned, vm_clear_2 was lost
> 4. delete another page, emits one log (3, page1, vm_clear_3), vm_clear_3
> still on that vm page
> 5. power down
> 6. startup, redo will replay all change after checkpoint, but vm_clear_2
> will never be cleared
> Am I right?

No. A page can only be torn at a hard crash, ie. at step 5. A checkpoint 
flushes all changes to disk, once the checkpoint finishes all the 
changes before it are safe on disk.

If you crashed between step 2 and 3, the VM page might be torn so that 
only one of the vm_clears has made it to disk but the other has not. But 
the WAL records for both are on disk anyway, so that will be corrected 
at replay.

>>   Another question:
>>> To address the problem in
>>> http://archives.postgresql.org/pgsql-hackers/2010-02/msg02097.php
>>> , should we just clear the vm before the log of insert/update/delete?
>>> This may reduce the performance, is there another solution?
>>>
>>
>> Yeah, that's a straightforward way to fix it. I don't think the performance
>> hit will be too bad. But we need to be careful not to hold locks while doing
>> I/O, which might require some rearrangement of the code. We might want to do
>> a similar dance that we do in vacuum, and call visibilitymap_pin first, then
>> lock and update the heap page, and then set the VM bit while holding the
>> lock on the heap page.
>>
> Do you mean we should lock the heap page first, then get the blocknumber,
> then release heap page,
> then pin the vm's page, then lock both heap page and vm page?
> As Robert Haas said, when lock the heap page again, may there isnot enough
> free space on it.

I think the sequence would have to be:

1. Pin the heap page.
2. Check if the all-visible flag is set on the heap page (without lock). 
If it is, pin the vm page
3. Lock heap page, check that it has enough free space
4. Check again if the all-visible flag is set. If it is but we didn't 
pin the vm page yet, release lock and loop back to step 2
5. Update heap page
6. Update vm page

> Is there a way just stop the checkpoint for a while?

Not at the moment. It wouldn't be hard to add, though. I was about to 
add a mechnism for that last autumn to fix a similar issue with b-tree 
parent pointer updates 
(http://archives.postgresql.org/message-id/4CCFEE61.2090702@enterprisedb.com), 
but in the end it was solved differently.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

pgsql-hackers by date:

From: Noah Misch
Date: 31 March 2011, 07:07:03
Subject: Re: BUG #5856: pg_attribute.attinhcount is not correct.

From: Heikki Linnakangas
Date: 31 March 2011, 07:42:07
Subject: Re: SHMEM_INDEX_SIZE exceeded on startup

Re: crash-safe visibility map, take four - Mailing list pgsql-hackers

Previous

Next