Thread: Dead Space Map version 2

Dead Space Map version 2

From

ITAGAKI Takahiro

Date:

26 February 2007, 23:08:13

This is the second proposal for Dead Space Map (DSM).
Here is the previous discussion:
http://archives.postgresql.org/pgsql-hackers/2006-12/msg01188.php

I'll post the next version of the Dead Space Map patch to -patches.
I've implemented 2bits/page bitmap and new vacuum commands.
Memory management and recovery features are not done yet.

I think it's better to get DSM and HOT together. DSM is good at complex
updated cases but not at heavily updated cases. HOT has opposite aspects,
as far as I can see. I think they can cover each other.

2bits/page bitmap
-----------------

Each heap pages have 4 states for dead space map; HIGH, LOW, UNFROZEN and
FROZEN. VACUUM uses the states to reduce the number of target pages.

- HIGH : High priority to vacuum. Maybe many dead tuples in the page.
- LOW : Low priority to vacuum Maybe few dead tuples in the page.
- UNFROZEN : No dead tuples, but some unfrozen tuples in the page.
- FROZEN : No dead nor unfrozen tuples in the page.

If we do UPDATE a tuple, the original page containing the tuple is marked
as HIGH and the new page where the updated tuple is placed is marked as LOW.
When we commit the transaction, the updated tuples needs only FREEZE.
That's why the after-page is marked as LOW. However, If we rollback, the
after-page should be vacuumed, so we should mark the page LOW, not UNFROZEN.
We don't know the transaction will commit or rollback at the UPDATE.

If we combine this with the HOT patch, pages with HOT tuples are probably
marked as UNFROZEN because we don't bother vacuuming HOT tuples. They can
be removed incrementally and doesn't require explicit vacuums.

In future work, we can do index-only-scan for tuples that is in UNFROZEN or
FROZEN pages. (currently not implemented)

VACUUM commands
---------------

VACUUM now only scans the pages that possibly have dead tuples.
VACUUM ALL, a new syntax, behaves as the same as before.

- VACUUM FULL : Not changed. scans all pages and compress them.
- VACUUM ALL : Scans all pages; Do the same behavior as previous VACUUM.
- VACUUM : Scans only HIGH pages usually, but also LOW and UNFROZEN pages on vacuums in the cases
forpreventing XID wraparound.

The commitment of oldest XID for VACUUM is not changed. There should not be
tuples that XIDs are older than (Current XID - vacuum_freeze_min_age) after
VACUUM. If the VACUUM can guarantee the commitment, it scans only HIGH pages.
Otherwise, it scans HIGH, LOW and UNFROZEN pages for FREEZE.

Performance issues
------------------

* Enable/Disable DSM tracking per tables DSM requires more or less additional works. If we know specific tables
whereDSM does not work well, ex. heavily updated small tables, we can disable DSM for it. The syntax is: ALTER
TABLEname SET (dsm=true/false);

* Dead Space State Cache The DSM management module is guarded using one LWLock, DeadSpaceLock. Almost all accesses
toDSM requires only shared lock, but the frequency of shared lock was very high (tied with BufMappingLock) in my
research. To avoid the lock contention, I added a cache of dead space state in BufferDesc flags. Backends see the
flagsfirst, and avoid locking if no need to

* Agressive freezing We will freeze tuples in dirty pages using OldestXmin but FreezeLimit. This is for making
FROZENpages but not UNFROZEN pages as far as possible in order to reduce works in XID wraparound vacuums.

Memory management
-----------------

In current implementation, DSM allocates a bunch of memory at start up and
we cannot modify it in running. It's maybe enough because DSM consumes very
little memory -- 32MB memory per 1TB database.

There are 3 parameters for FSM and DSM.
- max_fsm_pages = 204800 - max_fsm_relations = 1000 (= max_dsm_relations) - max_dsm_pages = 4096000

I'm thinking to change them into 2 new paramaters. We will allocates memory
for DSM that can hold all of estimated_database_size, and for FSM 50% or
something of the size. Is this reasonable?
- estimated_max_relations = 1000 - estimated_database_size = 4GB (= about max_fsm_pages * 8KB * 2)

Recovery
--------

I've already have a recovery extension. However, it can recover DSM
but not FSM. Do we also need to restore FSM? If we don't, unreusable
pages might be left in heaps. Of cource it could be reused if another
tuple in the page are updated, but VACUUM will not find those pages.

Comments and suggestions are really appreciated.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Thread: Dead Space Map version 2

Attachment