Re: Dead Space Map for vacuum - Mailing list pgsql-hackers
From | ITAGAKI Takahiro |
---|---|
Subject | Re: Dead Space Map for vacuum |
Date | |
Msg-id | 20070117142353.5AC5.ITAGAKI.TAKAHIRO@oss.ntt.co.jp Whole thread Raw |
In response to | Re: Dead Space Map for vacuum ("Simon Riggs" <simon@2ndquadrant.com>) |
List | pgsql-hackers |
I can see that there are two issues in the design of Dead Space Map in the recent discussions: 1. information accuracy of dead spaces 2. memory management I'll write up the discussion about the 1st for now. ---- We need to increase page-tracking status for effective vacuum. 1 bit per block is not enough. "Simon Riggs" <simon@2ndquadrant.com> wrote: > I would suggest that we tracked whether a block has had 0, 1 or 1+ > updates/deletes against it. When a block has 1+ it can then be > worthwhile to VACUUM it and to place it onto the FSM. Two dead tuples is > really the minimum space worth reclaiming on any block. The suggestion is to classify pages by vacuum priority. There are 3 tracking status in the model. [A1] Clean (all tuples in the page are frozen) [A2] Low priority to vacuum [A3] High priority to vacuum In another discussion, there is a idea to avoid aggressive freezing. Normal VACUUM scans only pages marked in the B3 bitmap. [B1] Clean [B2] Unfrozen (some tuples need to be frozen) [B3] Unvacuumed(some tuples need to be vacuumed) Both of the above have only 3 status, so that we can describe all of them in 2 bits. I would suggest the 4 status DSM model: [C1] Clean [C2] Unfrozen (all tuples are possible to be frozen, but notyet) [C3] Low priority to vacuum [C4] High priority to vacuum INSERT or after-UPDATE tuples are marked with C3 status -- they need only to be frozen on commit. In the other hand, DELETE or before-UPDATE tuples are marked with C4 status -- to be vacuumed on commit. If transaction becomes ROLLBACK, the necessity of freeze/vacuum will be inverted, but we can suppose COMMIT is more than ROLLBACK. We can lower the priority C4 to C3 for the pages that has too small free spaces to reuse, as the original idea by Simon. We can refer to C3 status to find the page has had 0 or 1 dead tuples then. Marking either C3 or C4 is an optimizing issue. We need to add new two VACUUM modes, that use Dead Space Map. Almost users and autovacuum use only the mode 5. 1.VACUUM FULL (scan all pages) 2.VACUUM FREEZE ALL (scan all pages) 3.VACUUM ALL (scan all pages) 4.VACUUM FREEZE (scan C2,3,4) 5.VACUUM (scan only C4 basically) VACUUM downgrades the status of scanned pages from C4 to other. If any dead tuples, VACUUM tries to freeze all tuples in the page and change its status to C1(Clean), because it become dirty at all, freezing is almost free (no additional I/Os). When unfrozen or unvacuumed tuples remain, the status becomes C2 or C3. Normal VACUUM (with DSM) scans only pages marked with C4 status basically, but it may be good to vacuum other pages in some cases; in maintenance windows, in the case we can retrive several pages in one disk read, etc. This is also an optimizing issue. Any ideas? Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
pgsql-hackers by date: